Emerging Biomarkers in Bladder Cancer Identified by Network Analysis of Transcriptomic Data

Bladder cancer is a very common malignancy. Although new treatment strategies have been developed, the identification of new therapeutic targets and reliable diagnostic/prognostic biomarkers for bladder cancer remains a priority. Generally, they are found among differentially expressed genes between patients and healthy subjects or among patients with different tumor stages. However, the classical approach includes processing these data taking into consideration only the expression of each single gene regardless of the expression of other genes. These complex gene interaction networks can be revealed by a recently developed systems biology approach called Weighted Gene Co-expression Network Analysis (WGCNA). It takes into account the expression of all genes assessed in an experiment in order to reveal the clusters of co-expressed genes (modules) that, very probably, are also co-regulated. If some genes are co-expressed in controls but not in pathological samples, it can be hypothesized that a regulatory mechanism was altered and that it could be the cause or the effect of the disease. Therefore, genes within these modules could play a role in cancer and thus be considered as potential therapeutic targets or diagnostic/prognostic biomarkers. Here, we have reviewed all the studies where WGCNA has been applied to gene expression data from bladder cancer patients. We have shown the importance of this new approach in identifying candidate biomarkers and therapeutic targets. They include both genes and miRNAs and some of them have already been identified in the literature to have a role in bladder cancer initiation, progression, metastasis, and patient survival.


INTRODUCTION
Bladder cancer (BCa) is the ninth most prevalent malignant disease globally, with more than 400,000 new cases diagnosed each year, especially in males and elderly, and the thirteenth most common cause of cancer death. Five-year survival rate for early stage BCa patients reaches 95.7%, while for metastatic patients it is just 5% (1). Recent advancements in therapy include immunotherapy with PD-1 antibody pembrolizumab and the PD-L1 antibody atezolizumab which have yielded better results compared to chemotherapy (2). The most common BCa type is transitional cell carcinoma, also called urothelial carcinoma, while squamous cell carcinoma and adenocarcinomas are rare. For the majority of new BCa patients, non-muscle-invasive BCa (NMIBC) is diagnosed. This low-grade tumor often recurs and in about 20% of cases it progresses to high grade, i.e., to the muscle-invasive BCa (MIBC) which is more likely to develop metastases (3). In addition, molecular data, including chromosomal aberrations, mutation rates, presence of mutated tumor suppressor genes, gene, and miRNA expression levels have led to the definition of specific molecular subtypes. This depicts BCa as a molecularly and clinicopathologically heterogeneous disease. Diagnosis can be based on cystoscopy, microscopy, voided urinary cytology, blood detection in the urine and assessment of urine-based tumor markers, such as complement factor H-related protein (BTA test) and nuclear matrix protein 22 (NMP22) (4). However, the sensitivity and specificity of these markers decrease in the presence of inflammatory cells and other contaminating cells in the sample (5). Therefore, the identification of new biomarkers could improve diagnostic or prognostic performance of BCa tests. Also small extracellular vesicles released by BCa are currently being investigated, since they are known to be involved in cancer growth, progression and metastatic spread and the molecules contained in these vesicles may be potential BCa biomarkers (6,7).
An effective method to discover biomarkers is transcriptome profiling, by microarrays or more recently by RNA-seq, that allows the determination of differentially expressed genes and non-coding RNAs under different conditions. For example, the comparison of the expression levels of thousands of genes in healthy vs. cancer tissue samples, or among different tumor stages or tumors under different treatments, has led to the identification of several differentially expressed genes. They could represent candidate therapeutic targets or biomarkers for tumor onset, progression, or prognosis. However, over the past few years, a major drawback has emerged regarding differential expression analysis. In particular, differentially expressed genes are treated individually and this does not allow the identification of coregulation mechanisms among them. Highlighting these gene interactions is important, because they are often altered in complex genetic disorders like cancer (8)(9)(10). This emerging systems biology method focuses on the analysis of gene regulation alterations allowing a better understanding of cancer onset and its progression and the identification of critical cancer driver genes. Moreover, these clusters of co-expressed genes, that constitute the regions of a complex gene network, often correspond to cellular pathways (8)(9)(10).
Currently, a widely used approach to process gene expression data and investigate network alterations is the Weighted Gene Co-expression Network Analysis (WGCNA), that draws gene networks where the connections among pairs of genes are identified and weighted based on their correlated expression levels across multiple samples (11). Briefly, after processing the expression profiles into weighted connections, WGCNA can identify the network topology and, by using the topological overlap dissimilarity as the measure of distance among genes, it allows the identification of sub-networks, called modules [for further details see (12)]. Therefore, only highly co-expressed genes (i.e., connected with strong weights in the network) can compose the gene modules. It is also possible to relate these modules to clinical traits of interest. For example, regarding the comparison of two distinct networks deriving from tumor and normal gene expression data, WGCNA can identify the modules and genes belonging to them that reflect the regulatory alterations related to transcriptional changes. In particular, the most interconnected genes in a module (hub genes) are often functionally important and thus they could play a key role in cancer and represent candidate diagnostic and prognostic biomarkers or potential therapeutic targets (13)(14)(15).

EMERGING BCA BIOMARKERS IDENTIFIED BY THE WGCNA METHOD
In this review, we focused on all studies where the WGCNA method was applied to analyze gene expression data deriving from BCa samples ( Table 1).
Recently, WGCNA was applied on publicly available microarray gene expression data deriving from 93 BCa patients in order to identify genes related to different tumor stages (Ta, T1, T2, T3, and T4) and therefore to suggest potential prognostic biomarkers (16). A network module was highly correlated with tumor progression and further processing was performed in order to identify which cellular pathways were represented by the module genes. In general, since not all genes of a pathway are present in a module, the term "enrichment" is used to indicate how significantly a pathway overlaps with a module. A functional enrichment analysis is usually performed on a list of genes in order to reveal these enriched pathways. The identified module was significantly enriched in genes belonging to important biological processes ( Table 1). Within this module, four hub genes were identified: COL3A1, COL5A2, FBN1, and POSTN and, by using independent expression datasets, the COL3A1 (Collagen type III α1 chain) gene was validated as both a diagnostic and prognostic biomarker. In fact, it was upregulated in BCa tissues compared to normal ones and its high expression strongly correlated with tumor progression, shorter overall survival (OS) and disease-free survival (DFS) times. While there are few literature data about COL5A2 and FBN1 genes, the role of POSTN (periostin) in BCa has been widely investigated. In particular, high grade BCa showed very low levels of POSTN gene and the artificial restoration of its expression suppressed cell invasiveness and metastasis (22). It was also shown that the decrease of invasiveness and the inhibition of the epithelial-to-mesenchymal transition (EMT) were due to bladder-specific upregulation of the E-cadherin expression by periostin (23).
Lately, microarray gene expression data of 165 primary BCa samples were analyzed by WGCNA in order to identify genes correlated with TNM staging and OS (17). In total, 11 modules correlated with TNM staging and they were enriched in genes belonging to cell proliferation associated pathways ( Table 1). A filtering step using protein-protein interaction network information collected in STRING tool (https:// string-db.org/) resulted in 17 genes with a potential role in BCa (Table 1). Moreover, 11 hub genes could be considered as prognostic biomarkers, since their lower expression was associated with better OS of BCa patients. Some identified hub genes had been previously investigated. In particular, the high expression of AURKB (Aurora Kinase B), implicated in cancer through development of aneuploidy and chromosomal instability, correlated with advanced BCa stages (24). Similarly, the mitotic checkpoint protein BUB1B, known to contribute to chromosomal instability, was over-expressed in advanced BCa stages and correlated with high cell proliferation (25). High expression levels of CDC25B (Cell division cycle 25B) were associated with advanced stages, recurrence and poor prognosis in BCa patients (26). Cyclin B2 (CCNB2) expression level was higher in cancer than in normal bladder mucosa and its downregulation inhibited cell migration, invasion, and metastatic abilities (27). Also the role of FOXM1 (Forkhead box M1) has been widely investigated, indeed it was found to be a reliable prognostic biomarker for MIBC (28) and its high expression level correlated with TNM stage, histological grade, metastases, and poor prognosis in BCa patients, whereas its down-regulation through miR-24-1 inhibited cell proliferation, migration, and invasion (29,30). Since CCNB2 and FOXM1 suppression resulted in BCa inhibition, they can be considered as reliable therapeutic targets and deserve further exploration. Moreover, MMP11 (Matrix metalloproteinase-11) overexpression correlated with very aggressive phenotypes (advanced pT status, nodal metastasis, high histological grade) and with unfavorable clinical outcomes (31). Serum concentration of TK1 (Thymidine kinase 1) gene was associated with tumor stage, degree of invasion, and metastasis (32). TOP2A (Topoisomerase-IIA) free DNA in urine has been identified as a diagnostic biomarker and its levels can also distinguish NMIBC from MIBC (33). Its over-expression has also been associated with high-grade and high-stage BCa and with high rates of recurrence in NMIBC (34). Moreover, TOP2A protein levels have been identified as a predictor of DFS times (35). TPX2 (microtubule nucleation factor) gene was upregulated in tumor tissues compared to normal bladder samples and it was strongly associated with pT status, high histological grade, lymph node metastasis, and shorter OS time. Indeed its overexpression promoted proliferation and tumorigenicity and suppressed apoptosis (36). UBE2C (Ubiquitin conjugating enzyme E2) upregulation was associated with high BCa stages, presence of lymphovascular invasion, progression to MIBC, and it has been suggested as a biomarker of unfavorable prognosis (37,38).
In an experiment where single-cell transcriptomics was applied to squamous cell carcinoma of urinary bladder (SCCB), 75 tumor cells and 18 normal cells were isolated from cancer and normal control fresh resected tissues from one patient (18). Then, gene expression analysis was performed by single-cell RNA-seq. WGCNA analysis identified five large modules enriched in genes belonging to several cancer-associated pathways ( Table 1). While some of the identified hub genes ( Table 1) have rarely been reported in previous cancer studies, some of them have already been suggested as BCa biomarkers. For example, CTNND2 (Catenin delta 2) gene is frequently amplified in BCa, with high copy numbers (39). Moreover, copy number variations of PCSK6 (Proprotein convertase subtilisin/kexin type 6) gene have been reported to be a prognostic marker for NMIBC progression (40). Intriguingly, POU2F3 (POU class 2 homeobox 3) is a transcription factor expressed in stratified squamous epithelia and related to squamous epithelial stratification (41), so it could play a role in squamous cell BCa (SCCB).
Microarray gene expression profiles of 9 normal bladder and 9 transitional cell carcinoma tissue samples have been analyzed by the WGCNA method (19). A differential co-expression network analysis was carried out and eight hub genes were identified ( Table 1). Interestingly, among the identified hub genes, the rs762551 polymorphism in CYP1A2 (a cytochrome P450 family member) gene was associated with decreased BCa risk (42). Molecular mechanisms explaining this association are still undetermined, however since it lies in the first intron of the gene, it could alter pre-mRNA splicing processing at 5 ′ UTR level, transcription regulation, or protein folding (43)(44)(45)(46)(47)(48)(49)(50)(51).
Recently, different gene expression profiles between NMIBC and MIBC have been investigated using public microarray expression data from 8 and 11 snap frozen cancer tissues, respectively (20). WGCNA analysis showed significant correlations between three modules and the tumor stage. In particular, these modules were enriched in genes mainly involved in cell cycle ( Table 1). Among them, 16 hub genes have been identified (Table 1). Therefore, they can be involved in BCa progression from NMIBC to MIBC phenotype. Many hub genes have been previously suggested as candidate diagnostic or prognostic BCa biomarkers. For example, TRAK1 (trafficking kinesin protein 1) gene has been identified as a favorable prognostic marker, since its low level expression was associated with poorer survival (52). Polymorphisms in CYP3A5 (a cytochrome P450 family member) can define a subset of BCa patients who better respond to cabazitaxel and temsirolimus, in terms of lower toxicity and higher efficacy (53,54).
However, regarding the last two studies (19,20), it should be noted that WGCNA was performed using sample sizes that were too small, that is 8, 9, or 11 samples for each condition. Indeed, at least 20 samples are required for a robust WGCNA analysis and to obtain reliable results.
Recently, also miRNA expression data of 418 BCa and 19 normal tissue samples collected in The Cancer Genome Atlas (TCGA) have been investigated (21). After the selection of differentially expressed miRNAs, WGCNA analysis allowed the identification of a module closely related to BCa progression. Thirteen downregulated miRNAs ( Table 1) also had a prognostic value since their low expression levels were associated with poorer OS in BCa patients. Interestingly, the predicted targets of these miRNAs were found to be significantly enriched in cancerrelated pathways ( Table 1). It has been shown that miR-1-1 and miR-133a are under-expressed in BCa cells and act as tumor suppressive miRNAs since, when overexpressed, they inhibited cell proliferation and invasion and increased apoptosis (55). In particular, miR-1 can exert its tumor suppressive function by targeting both coding genes, for example CCL2 (56), and noncoding RNAs, including UCA1 (57). MiR-133a can also induce apoptosis through silencing of GSTP1 in BCa cell lines (58) and, along with miR-1, it can inhibit BCa cell proliferation and increase apoptosis by targeting TAGLN2 mRNA (59). Moreover, miR-133a and miR-145 suppress cancer cell proliferation by directly regulating FSCN1 expression (60). MiR-133b plays a key role in proliferation and apoptosis by silencing Bcl-W and AKT1 genes (61) and its downregulation is associated with BCa progression and poor prognosis (62). Also miR-139 is a tumor suppressive miRNA since it can inhibit BCa cell proliferation by targeting the BMI1 oncogene (63) and cell migration and invasion by silencing matrix metalloprotease 11 (MMP11) (64). It has been observed that miR-143 can inhibit tumor cell proliferation, it is associated with BCa resistance to gemcitabine (65) and, along with miR-145, is a good prognostic biomarker for BCa patient survival (66). Interestingly, the polymorphism rs353293 in the common promoter of miR-143 and miR-145 is associated with BCa risk (67). Finally, miR-195 suppressed cancer cell proliferation by silencing GLUT3 (68), CDK4 (69), and CDC42 (70) genes. Therefore, these tumor suppressive miRNAs can be considered for in vivo delivery of therapeutics in bladder cancer, even though there are still challenges to the development of miRNA delivery strategies without toxicity induction.

FURTHER ANALYSES ON IDENTIFIED HUB GENES AND MIRNAS
We performed a Gene Ontology analysis using all hub genes identified by WGCNA in the reviewed studies. Overall, they were found to belong to related biological processes. In particular, we highlighted pathways such as cell cycle, mitosis, mitotic spindle organization, kinetochore assembly, and nuclear division. All these processes are involved in the uncontrolled cell proliferation in cancers. In addition, by using the transcription factor binding data generated by the ENCODE project consortium (www.encodeproject.org), we identified the E2F4 and FOXM1 transcription factors as the master regulators of hub gene expression, since they resulted as being significantly overrepresented in the promoters of hub genes (p = 5.892e-9 and p = 3.220e-8, respectively). Moreover, in order to investigate whether relationships exist between these miRNAs and the hub genes listed in Table 1, we performed a comprehensive literature search and identified which hub genes are targets of the hub miRNAs. In order to increase the analysis stringency, we considered only the experimentally assessed miRNA targets. Results reported in Table 2 show that nearly all hub miRNAs target at least one hub gene, therefore these miRNAs could be involved in the deregulation of hub genes. Alternatively, hub miRNAs and genes could be deregulated due to the alteration of master regulators, such as transcription factors.

CONCLUSIONS
WGCNA is a recently developed method for the analysis of gene expression data able to propose candidate therapeutic targets or diagnostic/prognostic biomarkers. Here, we reviewed all studies where WGCNA has been applied for the analysis of expression data from BCa. They include analyses of gene and miRNA expression data. Notably, neither lncRNA expression or splicing isoform-specific RNA expression have yet been investigated in BCa by WGCNA, as recently carried out, for example, in pancreatic cancer (15) and in clear cell renal cell carcinoma (ccRCC) (86), respectively. In addition, expression data of circular RNAs and RNAs in exosomes have not been analyzed by WGCNA in any cancer type. Recently, this network strategy has also been applied to proteomic and metabolomic data, although, due to the low coverage of proteomic and metabolomic analytical methods, WGCNA needed to be modified (87). Moreover, since RNA-seq data simultaneously allow the gene expression measure and mutation analysis, it would be interesting to perform an evaluation of the mutation effects on the gene co-expression module assignment of a gene. Although WGCNA requires expression data from at least 20 samples in order to obtain reliable results, only 8, 9, or 11 samples for each condition were analyzed in two reviewed studies (19,20). Generally, in order to overcome the problem of a small sample size, researchers use more than one expression dataset, particularly during analysis of microarray data. However, systematic and technical differences between different microarray platforms and datasets (called batch effects) could emerge. The correct data pre-processing is needed since WGCNA is sensitive to batch effects. Moreover, the presence of outliers (samples with very different expression profiles form the bulk) may affect WGCNA results, thus their removal is a critical step (12). Unfortunately, sometimes researchers tend to reject few outliers because of the small sample size. A further critical element for WGNCA analysis is the highly variable gene expression among samples due to tumor heterogeneity. To overcome this problem, large sample size expression datasets should be used and, in particular, datasets that include samples isolated from different points of a single cancer tissue should be preferred. Moreover, we suggest performing sample clustering based on expression data before WGCNA analysis, in order to process less heterogeneous samples, as recently carried out for ccRCC (88).
High BCa heterogeneity, different microarray platforms, specific setting of WGCNA algorithm and different patient therapies, lifestyle, stages, age, and sex could explain the little overlap between key genes identified among the studies. However, since these hub genes are highly co-expressed, they could highlight a common mechanism of transcriptional and post-transcriptional regulation, thus revealing mechanistic insights into cancer development. In particular, we identified the hub gene FOXM1 as the master regulator of the other genes identified by WGCNA. Hub genes could also be related at a functional level, since they belong to similar pathways and are regulated by a few common hub miRNAs. Furthermore, these miRNAs are known to be involved in the same pathways of hub genes, thus supporting the role in BCa of genes and miRNAs identified by WGCNA. Finally, although currently no hub gene has sufficient validity for clinical practice, many of them are already known to serve as biomarkers in BCa. Therefore, these genes are worth being further explored, since they can shed light into the molecular mechanisms of BCa, thus leading to the definition of novel personalized therapies. For example, regarding biomarkers for chemotherapy efficacy and toxicity, WGCNA identified the cytochrome P450 family member CYP3A5, already found to be useful in defining BCa patients with better responses to treatments.

AUTHOR CONTRIBUTIONS
MG and FP conception and design. MG, GO, ARi, and FP drafting the manuscript. GO, ARi, MB, and AC review of the literature. ARu, EC, TC, and GP critical revision of the manuscript.