A Novel Set of WNT Pathway Effectors as a Predictive Marker of Uterine Corpus Endometrial Carcinoma–Study Based on Weighted Co-expression Matrices

Uterine corpus endometrial carcinomas (UCEC) are clinically divided into two subgroups—endometrioid endometrial carcinoma (EEC) or non-endometrioid endometrial carcinoma (NEEC). The first group shows relatively better prognosis. However, the discrimination seems to be insufficient due to the fact that in the mildest EEC are patients with poor treatment response and bad prognosis. Our aim was to examine the molecular background of such phenomenon and whether gene expression patterns might be of importance for the clinic. We focused our analysis on WNT pathway target genes since it is one of the main regulators of endometrial proliferation and differentiation. In silico analysis of TCGA data, including Weighted Co-expression Network Analysis, Principle Component Analysis, and Multiple Factor Analysis, allows to select 28 genes that serve as a predictive markers for UCEC patients. Our study revealed that there is a subgroup of the endometrioid cases that molecularly resembles mixed/serous groups. This may explain the reason for existence of subgroup of patients, that although clinically diagnosed with the mildest endometrioid UCEC type, yet present failure in treatment and aggressive course of the disease. Our study suggests that worse outcome in these patients may be based on a disruption of proper WNT signalling pathway resulting in deregulation of its effector genes. Moreover, we showed that mixed group consisting of tumours containing both endometrioid and serous types of cells, has serous expression profile of WNT targets. The proposed gene set allows to predict progression of the disease trough dividing patients into groups of low or high grade with 70.8% sensitivity and 88.6% specificity (AUC = 0.837) as well as could predict patient prognosis associated with UCEC subtype with 70.1% sensitivity and 86.2% specificity (AUC = 0.855). Relatively small number of implicated genes makes it highly applicable and possibly clinically simple and useful tool.


INTRODUCTION
Uterine Corpus Endometrial Carcinoma (UCEC) is one of the most common gynaecological cancers which can spread outside the uterus through myometrium invasion (1). Although most of UCEC cases are low grade and have a favourable prognosis, there is a subgroup of highly malignant tumours which accounts for high mortality rate.
In 1983 Bokhman et al. demonstrated two distinct pathways of uterine malignancies development. Endometrioid endometrial carcinoma (EEC) represents about 70-80% of all UCEC cases and is observed in young pre-and post-menopausal women. EEC is typified by expression of oestrogen (ER) and progesterone (PR) receptors, so it is thought to be influenced by endocrine modulation (oestrogen unopposed by progesterone) and follows the oestrogen-related pathway (2,3). Non-endometrioid endometrial carcinoma (NEEC) is diagnosed in relatively older, postmenopausal women, in the background of an atrophic endometrium without association with oestrogen stimulation. NEECs are highgrade tumours characterized by serous papillary or clear-cell morphology, as well as aggressive clinical course and poor prognosis, resulting from high potential for deep myometrial invasion and lymphatic spread. NEEC tumours are characterized by poor differentiation and high risk of deep myometrial invasion (2)(3)(4)(5)(6).
In 2013, the Cancer Genome Atlas (TCGA) Research Network published the molecular classification based on the analysis of copy number, microsatellite instability, genome sequencing, and exome sequence. It divides UCEC tumours into four group: POLE ultramutated, MSI hypermutated, copy-number (CN) low, and CN high presenting different risk for disease progression (7)(8)(9).
In these molecular based groups, a significant histopathological diversity is observed. For example, the best prognosing POLE group surprisingly contains both NEEC and high grade EEC phenotypes but with good prognosis (10). This indicates the need for research in order to search for molecular classifiers for better prediction of the course of the disease.
We decided to use an integrated expression analysis to find groups of genes correlated with specific prognosis. Our goal was to create a relatively small set of genes whose expression would predict the patients' prognosis independently of the grade and histological type. We focused on WNT signalling pathway as one of the pivotal intracellular regulator.
Wnt/β-catenin signalling pathway is deregulated in 40% of endometrial cancers. This signalling track is essential for maintaining balance in the proliferation and differentiation of endometrium through sex hormone regulation during the menstrual cycle (11)(12)(13).
Contrary to most of the studies, we concentrated on the downstream WNT effector proteins whose expression is regulated by TCF/LEF transcription factors.

Downstream Effects of Wnt Signalling in UCEC
As Wnt alterations affect aggressiveness of UCEC we focused on downstream effects of the pathway through consideration of 2,574 target genes of Wnt-specific transcription factors (TCF/LEF family). This study aimed to identify molecular signatures within Wnt endpoint effects that may be associated with histological subtypes and grade of UCEC. Hereby applied WGCNA algorithm transforms the adjacency matrix into the distance matrix of co-expression, based on topological overlap, using in this study soft thresholding power of seven followed by gene hierarchical clustering and dynamic tree-cutting algorithm. This algorithm reveals clusters of genes (hereinafter referred to as modules) of a common expression profile in correlation with clinical traits of UCEC. Thus, it was possible to define nine distinct modules of different sizes (black: 50 genes, blue: 442 genes, brown: 183 genes, green: 107 genes, pink: 39 genes, red: 60 genes, turquoise: 492 genes, yellow: 147 genes, grey: 1053 genes) that were correlated with clinical traits such as histological subtype and grade. The most numerous grey module was discarded from further analyses as it includes all genes that do not share any pattern of expression. Supplementary Figure 1 presents the topological overlap matrix accompanied by hierarchical clustering dendrograms and module colours (referred as TOM plot), which indicates genes of high connectivity that tend to be essential for studied clinical traits. MEs considered as the most representative gene expression profiles were calculated for each module as the first principle component of the standardized expression profiles within a particular module to identify sets of genes reflecting tumour histology or grade in a significant manner. According to the module-trait relationship plot we found turquoise and yellow modules showing the highest positive correlation with both clinical traits (histology: r = 0.23, p = 6e − 6, r = 0.38, p = 5e−14, respectively, and grade: r = 0.44, p = 3e−19, r = 0.22, p = 2e−05, respectively) (Supplementary Figure 2). Both of them were also of the highest GS regarding UCEC grade (Supplementary Figure 3A) and histology (Supplementary Figure 3B), however the yellow module has been eventually omitted due to insufficient association between module membership (MM), defined as correlation metric of gene expression profile with ME, and GS (Supplementary Figure 4). Finally, the expression profiles of 492 turquoise module genes were visualized in a form of heatmaps and noteworthy, all of them showed two types of visible associations: firstly, molecular differentiation between UCEC grades ( Figure 1A) and secondly, molecular differentiation between histological subtypes revealing a novel subgroup of UCEC patients consisting of endometrioid, mixed and serous histologies ( Figure 1B). In order to associate the genes of turquoise module with biological processes that they might be involved in, enrichment analysis within GO BP was performed. Among enriched terms, several important findings emerged indicating potential causes of clinical differences between identified subtypes of UCEC or grades that may stem from alterations in the cell cycle, proliferative and differentiation processes as well as cellular death and DNA repair processes (Figure 2).

Dimensional Grouping of UCEC Patients
According to Identified Molecular Profiles Associating Histology and Grade-Principle Component Analysis (PCA) and Multiple Factor Analysis (MFA) We employed PCA to partition UCEC patients within two principle components according to the molecular profile of 492 turquoise module genes in association with both, histology and grade. We found clear partitioning of individuals across the first dimension with a total variance of 33% (Figures 3A,B). PCA was followed by MFA, which served to identify genes of the highest contribution to spatial partitioning of patients and select genes differentiating clinical groups in the most significant manner. This resulted in selection of 28 variables that differentiated individuals across the first principle component to the extent of 46.1% of the total variance that reflected molecular differences between low and high grades, and simultaneously between endometrioid and mixed groups of endometrioid, mixed and serous cancer subtypes (Figures 1C,D, 3C,D).

Selected Effectors of Wnt Pathway May Serve as Predictive Markers of UCEC Prognosis-Binomial Model of Logistic Regression and Receiver Operating Characteristic (ROC) Curve
We developed a predictive marker through the binomial model of logistic regression based on the combined expression of selected 28 genes in association with a defined molecular subtype. According to ROC, the developed marker could predict a level of UCEC progression through dividing patients into groups of low or high grade with 70.8% sensitivity and 88.6% specificity (AUC = 0.837) as well as could likewise predict patient prognosis associated with UCEC subtype with 70.1% sensitivity and 86.2% specificity (AUC = 0.855) (Figure 4). Moreover, the prognostic potential of presented marker has been confirmed with Kaplan-Meier plots for both, overall survival (OS), and disease-free survival (DFS). Interestingly, the split of UCEC patients was of much higher accuracy in comparison to standard OS and DFS according to grade (Figure 5). Detailed statistics are shown in Table 1.

Functional Network of Wnt Effectors Reveals Biological Differences Between Novel UCEC Subtypes-Network Analyst
Finally, we performed the analysis of functional networks within downstream effectors of Wnt that were altered in molecular subtypes of UCEC to identify main biological processes involved in increased UCEC aggressiveness. Global network was constructed with 286 nodes and 455 edges (FDR < 0.05) that revealed overrepresentation of several pathways and biological processes essential in malignant carcinomas. In the network, we identified 9 genes involved in Hedgehog signalling pathway, 10 genes of ErbB signalling, 7 genes associated with VEGF pathway, and 15 genes involved in MAPK cascade. Moreover, many adhesion-associated processes were enriched such as the adherens junction, focal adhesion, gap junction, and regulation of actin cytoskeleton that are known to accelerate tumour progression and invasiveness. Noteworthy, the molecular signature of endometrial cancer was likewise identified (Figure 6).

Significance of Mutations/CNVs in Wnt Pathway Members
Of 42 major Wnt pathway players we found only 5 of high significance regarding mutations and CNVs (Supplementary Table 1). They varied in number of altered cases between novel molecular subgroups of UCEC; specifically, APC, CTNNB1B, LRP5P, and TCF4F were found mutated mostly in group of endometrioid endometrial cancer cases and MYC was the only gene of altered copy number with a predominance in group of endometrioid, mixed and serous endometrial cancer cases.

Validation of the Results
Cross-validation of the findings based on independent UCEC cohorts was not performed due to lack of relevant expression data in association with specific clinical information.

DISCUSSION
Endometrial cancer is one of the most common gynaecological cancers with morbidity rates tending to increase (14). The majority of endometrial tumours present low stage and grade with favourable prognosis, although some fractions of patients experience different course of disease and significantly worse outcome (15). Nowadays, there are two main classification systems of endometrial cancer. The former created by Bokhman et al. is a historical classification based on the epidemiological, clinical and endocrine relationship that divides UCEC into two types: estrogen-related endometrioid endometrial carcinoma (EEC) and non-endometrioid endometrial carcinoma (NEEC). However, this division does not reflect the real nature of the endometrial malignant pathology and does not include clear-cell carcinoma, carcinosarcomas or other undifferentiated carcinomas (4). Since 2013, a new, TCGA-based, molecular classification of endometrial cancer has been provided consisting of four genomic groups that vary in clinical outcome. The  group with an increased number of gene copies (CN high) was characterized by the lowest survival rate (survivability at 50% after 5 and 10 years); the best-performing group was POLE (survivability at 100%), while the MSI hypermutated group and the CN low group had a similar survival rate of 80% (8,9). Interestingly, the molecular and classical Bokhman's classification systems do not overlap; for example, EEC grade 3 cancers do not constitute a separate molecular subgroup, since they are randomly distributed across all four TCGA-based groups (7). In our study, we employed WGCNA approach to identify genetic profiles associated with specific unfavourable prognosis of UCEC that could biologically explain such phenomenon. Our goal was to determine a relatively small set of genes that could be used to predict patients' prognosis independently of grade and histological type. We focused on the Wnt signalling pathway as one of the pivotal intracellular regulators of endometrium proliferation and differentiation, very often deregulated in endometrial adenocarcinoma.
In fact, alterations in the Wnt pathway are commonly observed in endometrial cancer at different signalling levels, e.g., ligands, main effectorβ-catenin, APC and Axins that affect executive transcription factors followed by deregulation of downstream effector target genes (11)(12)(13)(16)(17)(18). Therefore, unlike most of the reported studies, we decided to analyse downstream effects of Wnt disruption. The analysis was conducted on TCGA data collected from 370 UCEC patients. We identified a total of nine co-expression modules; primarily two of them were correlated with clinical characteristics such as grade and histological type of UCEC, although eventually the turquoise module was of the highest significance (cor > 0.4, p = 3e−19) that corresponded to 492 differentially expressed Wnt target genes (Supplementary Figure 3B). GO analysis revealed that they were mainly involved in the cell cycle, response to cellular stress and DNA damage, differentiation and proliferative processes, cell architecture and communication, as well as regulation (Figure 2). Noteworthy, these findings showed key alterations resulting from the disrupted signalling via Wnt pathway that may modify patients' prognosis due to the molecular profile of more aggressive features. Moreover, deregulation of these processes clearly refers to hallmarks of cancer such as acquired selfsufficiency in growth signals, sustaining proliferative signalling, evading apoptosis, and resisting cell death that comprise major biological capabilities of carcinogenesis (19,20). Other alterations associated with the cellular architecture, communication, and epithelium proliferation may reflect potential for epithelial-tomesenchymal transition (EMT), which contributes to increased migratory potential of cancer cells and UCEC aggressiveness (21).
Furthermore, we selected a set of 28 genes from the turquoise module which mostly contributed to clear dimensional partitioning of UCEC patients demonstrating distinct molecular profiles of mixed grade 1 and grade 2 vs. grade 3 and simultaneously endometrioid endometrial adenocarcinoma vs. mixed group consisting of endometrioid, serous, and mixed endometrial adenocarcinoma patients (Figures 1C,D, 2C,D). To date, relevance of several of these genes was demonstrated in different studies on endometrial cancer ( Table 2), however, of the greatest interest are genes which to our best knowledge have not been yet described elsewhere including XPO1O, SASS6S, NUP1P07, UBXN2NB22B, NOL1L1, DPM1M, COMMD2D, MED3D0, GINS4S, CENPO, NIF3Fl33l1l, LRCC4C2, AFG3GL33L2L, DDX2X7, PPIB, LMF1F, and ZSWIM7MM7.
Additionally, these genes have shown potential towards prediction of UCEC prognosis independently of histology or grade with sensitivity over 70% and specificity exceeding 85%, thus enabling to determine a group of patients with poor outcome (Figure 4). Interestingly, the prognostic potential was confirmed independently of any clinicopathological parameters through survival analysis with Kaplan-Meier plots. It showed that both OS and DFS were differentiated much more   accurately that with standard grade classification (Table 1, Figure 5). Finally, through the analysis of functional networks of Wnt downstream targets we identified several biological processes including, among others, Hedgehog, ErbB, VEGF signalling, and adhesion-associated processes considered as crucial in endometrial tumourigenesis that differed between novel molecular UCEC subgroups (Figure 6). Furthermore, we compared novel UCEC subgroups regarding occurrence of mutations and CNVs affecting major player of Wnt pathway (Supplementary Table 1) and found predominance of APC, CTNNB1B, LRP5P, and TCF4F mutations in the group endometrioid endometrial cancer cases as well as MYC copy number alteration in the group of endometrioid, mixed and serous endometrial cancer cases. These genes are well-known either tumour suppressors or oncogenes, nevertheless in this case they remain of unknown clinical significance. These findings confirmed biological significance of the previously chosen set of genes correlating with unfavourable prognosis and provided insight into mechanisms of UCEC progression. In summary, the presented analysis showed that current histological classification of UCEC into endometrioid, mixed, and serous endometrial adenocarcinoma is not adequate. It seems that there is a group of endometrioid cancer cases which molecularly resembles mixed/serous group. This may explain the existence of clinically observed group that despite favourable diagnosis such as the mildest endometrioid UCEC type yet presents failure in treatment and aggressive course of the disease. Our findings thus highlight a significant role of Wnt signalling in endometrial tissue and indicate that worse outcome may result from alterations in the signalling which affect its downstream effectors involved in many cellular processes. Moreover, we showed that despite an optimistic prognosis some patients experience very adverse course of endometrioid endometrial cancer resulting from the serous genetic profile of Wnt targets. We therefore proposed the set of 28 genes that allow to predict UCEC clinical outcome, which makes it a highly applicable and potentially useful tool of high clinical value. Finally, this study revealed that experimental validation of such phenomenon should be of the greatest research attention since there is no available independent cohort that allow to compare the results.

Input Data Acquisition and Preparation
For purposes of this study, we acquired data from The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma (TCGA-UCEC data status of 28th Jan 2016) comprising RNAseq expression profiles of 20502 genes (level 3 RNAseqV2V, RSEM normalized) and corresponding clinical data of UCEC patients. Furthermore, both data types were combined followed by exclusion of patients with missing expression/clinical information that resulted in a total of 370 patients qualified for further analyses. The detailed characteristics of UCEC patients are shown in Supplementary Table 2.
All other methods regarding biospecimen procurement, isolation and sequencing of RNA are previously described by The Cancer Genome Atlas Research Network (33).

Selection of Wnt Downstream Effectors
Targets of Wnt signalling pathway transcription factors were selected through the GTRD database (Gene Transcription Regulation database) offering collection of uniformly processed ChIP-seq data to identify transcription factor binding sites for human (34). We focused on well-known Wnt transcription factors belonging to TCF/LEF families such as LEF1F, TCF3F, and TCF4F (35) that could be found in GTRD. Finally, the input list included 2,571 genes that were recognized as Wnt downstream effectors.

Weighted Gene Co-expression Network Analysis (WGCNA)
Module detection was conducted through applying the WGCNA algorithm available as a collection of R functions enabling the performance of weighted correlation network analysis. In brief, for each gene pair the Pearson's correlation was computed followed by the transformation into adjacency matrix using the soft-thresholding approach (power = 7) with the scalefree topology index (R 2 ) > 0.9 (Supplementary Figures 5, 6). Subsequently, the topological overlap matrix (TOM) was constructed based on overlap between pairs of interconnected genes, which afterwards served as the input for hierarchical clustering (hclust function, "average" agglomeration method) and module detection by applying the dynamicTreeCut algorithm with a minimum module size of 30 genes. Detected modules were related to clinical traits of interest (histological subtype of UCEC, grade) by correlating module eigengenes (MEs) with external traits. In addition, gene significance (GS) for each module was determined as a linear regression p-value transformed by log10 between gene expression and clinical trait of interest. Finally, module significance (MS) was determined as average GS for all genes included in the module. Heatmaps visualizing expression profiles within particular modules were prepared with gplots R package (heatmap.2 function, row clustering according to the Pearson's distance metric and "complete" agglomeration method).
Dimensional Partitioning of UCEC Individuals-Principle Component Analysis (PCA) and Multiple Factor Analysis (MFA) Dimensional grouping of UCEC patients according to a set of selected genes to determine the relevance of histological subtype and grade of UCEC was performed using Principle Component Analysis (PCA). Primary PCA involved 492 genes of turquoise module as active variables with colouring of individuals regarding grade and histological subtype. Subsequently, Multiple Factor Analysis (MFA) was conducted to determine the contribution of particular genes into partitioning of individuals across first and second dimensions with the turquoise module gene set as an active variable and clinical traits as supplementary variables (explanatory variables do not participate in creating dimensions in an active fashion, they are visible only for explanatory purposes). By applying the fviz_contrib function we selected 28 genes, whose contribution to PC1C was found above the reference line (corresponds to the expected value if the contributions of all active variables were equal, thus variables above the reference line are considered as significantly contributing to a given dimension) followed by the second PCA involving the selected set of genes. All analyses were performed using FactoMineR and factoextra R packages offering a wide range of functions dedicated for multivariate exploratory data analyses and result visualization.

Building the Binomial Model of Logistic Regression and Receiver Operating Characteristic (ROC) Curve
To determine a predictive value of selected 28 genes, we built a binomial model of logistic regression by fitting generalized model that was specified by giving a symbolic description of identified molecular subgroups 0-253 patients with endometrioid endometrial adenocarcinoma and 1-117 patients with endometrioid, mixed, and serous endometrial adenocarcinoma) with regards to the gene set followed by prediction from the results of model fitting. Subsequently, the predictive potential of the novel marker was evaluated with the receiver operating characteristic (ROC) curve in the context of patient prognosis regarding clinical characteristics and aggressiveness of UCEC. The model was built with the glm R function with prediction from glm. ROC was plotted by employing pROC R package. Survival analysis showing predictive potential of novel marker independently of clinicopathological parameters was performed with univariate Cox regression model in R (coxph function). Kaplan-Meier plots were drawn with ggplot2t and survmisc R packages.

Functional Network of Differentially Expressed Wnt Downstream Effectors Corresponding to Molecular Subtypes of UCEC-Network Analyst
NetworkAnalyst is a freely available tool allowing to perform integrative analysis of gene expression data with visual exploration of resulting functional networks (36). Hereby we used Wnt effector list as a whole (2,574 genes) with patients labelled according to the classification of their molecular profile regarding UCEC subtype (253 endometrioid endometrial adenocarcinomas vs. 117 endometrioid, mixed and serous endometrial adenocarcinomas). The analysis was performed with default settings, auto-scaling of RNAseq counts and the FDR cutoff set as 0.05. Network analysis was conducted with proteinprotein interactions method, STRING confidence score cutoff set as 900 with requirement of experimental evidence and zeroorder network. Functions of identified genes were defined with KEGG database.

Mutation/Copy Number Variation Analysis
Additionally, we performed analysis of mutations and copy number variation (CNV) in UCEC patients according to novel molecular subgroups. Mutations/CNVs were obtained from cBioPortal repository (37,38)

DATA AVAILABILITY
The datasets analysed for this study can be found in the Broad Institute TCGA GDAC Firehose Repository (https://gdac. broadinstitute.org/).