Molecular Profiling Reveals Common and Specific Development Processes in Different Types of Gynecologic Cancers

Background Gynecologic cancers have become a major threat to women’s health. The molecular biology of gynecologic cancers is not as well understood as that of breast cancer, and precision targeting is still new. Although viewed collectively as a group of cancers within the female reproductive system, they are more often studied separately. A comprehensive within-group comparison on molecular profiles is lacking. Methods We conducted a whole-exome sequencing study of cervical/endometrial/ovarian cancer samples from 209 Chinese patients. We combined our data with genomic and transcriptomic data from relevant TCGA cohorts to identify and verify common/exclusive molecular changes in cervical/endometrial/ovarian cancer. Results We identified shared molecular features including a COSMIC signature of deficient mismatch repair (dMMR), four recurrent copy-number variation (CNV) events, and extensive alterations in PI3K-Akt-mTOR signaling and cilium component genes; we also identified transcription factors and pathways that are exclusively altered in cervical/endometrial/ovarian cancer. The functions of the commonly/exclusively altered genomic circuits suggest (1) a common reprogramming process during early tumor initiation, which involves PI3K activation, defects in mismatch repair and cilium organization, as well as disruption in interferon signaling and immune recognition; (2) a cell-type specific program at late-stage tumor development that eventually lead to tumor proliferation and migration. Conclusion This study describes, from a molecular point of view, how similar and how different gynecologic cancers are, and it provides a hypothesis about the causes of the observed similarities and differences.


INTRODUCTION
Gynecologic cancers have been estimated to claim more than 1.3 million (16.5% of all cancers in women) new cases worldwide in 2018, according to world cancer data (1). Surgical resection + chemotherapy and/or radiotherapy (in some advanced cases, only chemo/radiotherapy) remains as the mainstream of gynecologic cancer treatment. The molecular biology of this group of cancers is yet to be fully established, posing difficulties in molecular subtyping and precision targeting. Although the discovery of the predisposing effects of HPV infection has greatly improved the diagnosis and prevention of cervical cancer (CC), there is still a lack of effective screening methods for other gynecologic cancers, mainly endometrial cancer (EC) and ovarian cancer (OC).
How similar are different types of gynecologic cancers? And how are they distinguished from each other? On one hand, they all originate from the Mullerian ducts and all reside within the female reproductive system, which is under the regulation of female hormones (2). On the other hand, they arise from different cell types, having different clinical outcomes (survival, risks of recurrence/metastasis) and are thought to be caused by different mechanisms. For example, squamous cell carcinoma accounts for most of CC, while adenocarcinoma (from glandular cells) is the major histotype of EC, and serous cell carcinoma is mostly seen in OV. Unlike CC, which is most likely to be caused by HPV infection, the majority of EC is thought to be associated with long-term irritations by imbalanced female hormones. OV is generally believed to be the most aggressive gynecologic cancer type, and the cause of OV is controversial, with recent hypotheses suggest a non-ovarian origin (fallopian tube epithelium) (3). However, we have not seen many studies addressing the above questions from a molecular point of view. Although the TCGA molecular study on "Pan-Gyn" (gynecologic + breast) cancers (2) is the one with the largest sample size and the most comprehensive platforms, it included a large number of breast cancers (accounting for more than 40% of total samples) into the Pan-Gyn category, which may have affected the characterization of gynecologic cancer samples. Another study with a relatively small sample size (n = 117, 68 OC + 32 CC + 17 EC) focused on calculating tumor mutational burden (TMB) in Chinese gynecologic cancer patients (4). The study showed that EC have a higher median TMB than CC or OC, and mutations in PTEN, TSC2, or POLE are associated with increased TMB. To the best of our knowledge, a clear summary or conclusion of what molecular features are shared/exclusive in various types of gynecologic cancers is absent in existing literature.
While it is important to find out what the shared/exclusive molecular features are, it is even more important to understand why they are so. What intrinsic mechanisms drive these closely related cell types to develop into cancers with distinct phenotypes? Are there any common processes involved during the development of different types of gynecologic cancers, as reflected by their close distances? We believe the answers to these questions will help advance our understandings of the development of gynecologic cancers.
We conducted a whole-exome sequencing study in a total of 209 (74 CC, 68 EC, and 67 OC) Chinese gynecologic cancer patients. We examined the mutation landscape of the samples and validated our results with genomic and transcriptomic data from TCGA gynecologic cancer cohorts, namely TCGA-CESC (5), TCGA-UCEC (6), and TCGA-OV (7). Significant consistency was observed between the Chinese and the TCGA data. Similar mutation patterns were found among CC, EC, and OC at all levels (chromosomal changes, mutation signature, signaling pathways, and biological processes), indicating a common reprogramming process of cells at early stages of tumor development. We also identified transcription factors (TFs) and their relevant pathways that were exclusively altered in each cancer type, which suggest a possible cell-type specific program that further makes each cancer type form into shape.

Samples
We initially included surgically resected tumor samples from 263 sporadic gynecological cancer patients treated at The Six Affiliated Hospital of Sun Yat-sen University and The First Affiliated Hospital of Sun Yat-sen University from January 2017 to June 2019. The inclusion criteria for patients were (1) aged 20-82 years old; (2) initial diagnosis of primary cancer, confirmed by post-operative pathology; (3) previously untreated; (4) over 50% tumor cell content observed in hematoxylin and eosin stain slides under microscope. The exclusion criteria: (1) metastatic cancer; (2) ambiguous pathology; (3) accompanied by malignant tumors of other organs; (4) failed sample quality or insufficient amount of sample for experiment. Another two samples of rare cancer types (vaginal cancer and sarcoma) were excluded due to too small sample sizes. The final data set (n=209) included 74 cervical cancer (CC) cases, 67 ovarian cancer (OC) cases, and 68 endometrial cancer (EC) cases. Clinical information of each case was extracted from medical records, including age of diagnosis, classification and staging (TNM), progression status, and HPV status detected using HPVDetector (8). Informed written consent was obtained from each patient. This study was approved by the Ethics Committee of The Sixth Affiliated Hospital of Sun Yat-sen University. All procedures performed within this study were done in accordance with the Chinese ethical standards and with the 2008 Helsinki declaration.

Whole-Exome Sequencing
All tumor tissue samples were sent to TopGene Medical Laboratory (Zhongshan, China) for whole-exome sequencing. Genomic DNA extraction was performed using Mag-bind blood and tissue DNA HDQ 96 kit (Omega Bioservices, Norcross, GA, USA), according to the manufacturer's instructions. A UV spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) was used to check DNA quality. DNA quantification was performed with Qubit fluorometer 3.0 (Thermo Fisher Scientific, Waltham, MA, USA). Exome capture from the genomic DNA was performed with the AIExomeV1 panel (iGeneTech, China). PCR products were subjected to quality check with LabChip GX Touch24 (PerkinElmer). Pair-end sequencing was performed using MGISEQ-2000RS. The average depth of each sample was 100X and the read length was 150bp.

Data Processing and Mutation Analysis
Raw sequencing reads QC and filtering were done with Fastp (9). Read mapping to human genome hg19 was performed using BWA MEM (version 0.7.15-r1140) (10). GATK4 (11) were used for reads processing and the generation of base-quality recalibrated bam files. Somatic variants were first detected using GATK4 Mutect2; these variants were then further verified with samtools mpileup (12) and SomVarIUS (13) (the variant must also be detected by at least one of the two callers); variants with population allele frequency > 0.05 were excluded from the list. Somatic variants with allele fraction < 5% were filtered to reduce false discoveries caused by lack of matched normal and sequencing errors. Germline variants were called using GATK4 Haplotype Caller, and putative germline variants were separately marked. Driver gene analysis was performed using MutSigCV (14). Copy number variation (CNV) was called using GATK4 with a panel of normal, made of 32 normal tissue samples. Default threshold (2.0 z-score of non-log2 copy ratio) was used for the calling. The raw segment files generated by GATK4 CNV caller were then used as input for GISTIC2.0 (15) to calculate significant copy-number alterations with a threshold of ±0.3.The R package maftools (16) was used for the visualization of mutated genes and calculation of differentially mutated genes with Fisher's Exact Test (p<0.05). Genomic data of TCGA were downloaded from https://portal.gdc.cancer.gov/, and transcriptomic and survival data were downloaded via the RTCGA R package. Enrichment analysis was done using the online tool WebGestalt (17). Supplementary Table S1, which includes histology subtypes, age of diagnosis (ranged from 22-82, mean±SD:51.8±10.3, median: 52), TNM staging, HPV status, and tumor cell differentiation status.
Distinguishing Characteristics Among Gynecologic Cancers, Validated by TCGA Data Table 1 summarizes some features that characterize similarities and differences among the three gynecologic cancer types of the studied cohort, validated with genomic and/or transcriptomic data from the gynecologic cohorts of TCGA. Considerable horizontal similarities were found between TCGA vs Chinese for each cancer type, especially for EC with 60% overlap on top10 frequently mutated genes. The Chinese and TCGA data together revealed that EC have the highest average mutation load while OC have the highest average CNV frequency. All three cancer types contained a group of samples with COSMIC (v3) mutation signature indicative of deficiency in mismatch repair (dMMR); a FIGURE 1 | Mutation landscape represented by top 30 frequently mutated genes (small-scale mutations + copy number variations) in the studied gynecologic cancer cohort (n=209). Both small-scale mutations and CNV (Amp/Del) were taken into account. Annotations include cancer type of each sample, age groups (age < 45 -group 1; 45 ≤ age < 55 -group2; 55 ≤ age < 65 -group 3; age ≥ 65 -group 4), TNM overall staging, tumor cell differentiation status, HPV status, and mitochondrial copy number variation (MT_CNV). Likely germline mutations are highlighted with white dots.
group of CC samples were enriched with signature of APOBEC mutagenesis, while EC differed by having a group with mutation signature indicating defects in the polymerase POLE. The composition of COSMIC signatures for each cancer type were highly similar between Chinese and TCGA data, except TCGA-OV having a group with signature of homologous recombination repair deficiency and CHI-OC with a group of unknown signature. Note that TCGA-OV only consisted of high grade serous carcinoma, whereas the histotype composition of CHI-OC was much more complicated (Supplementary Table S1). The distinct histotype composition of the two OV cohorts may explain the dramatic difference in TP53 mutation prevalence (~90% in TCGA-OV and~50% in CHI-OC) as well as in other molecular features ( Table 1). Hence the molecular profiles of TCGA-OV and CHI-OC may be incomparable.
To identify molecular pathways that distinguish CC, EC, and OC, we selected transcription factor (TF) genes that are exclusively activated/suppressed (amplified/deleted) for each cancer type in the Chinese cohort. Then we used the corresponding TCGA transcriptomic data to verify whether there are significant differences in expression of these TFs and their target genes among the three gynecologic cancers. Here we define a candidate TF as "exclusively altered" if its expression value in the corresponding cancer type exceed 1.5 (or -1.5) fold-change to the other two cancer types. Pathways deemed significantly and exclusively altered for each cancer type were summarized in   Cilium organization; PI3K-Akt-mTOR signaling *Average CNV count numbers for TCGA-UCEC and TCGA-OV are not mentioned in publications. For each cancer type, genes that are overlapping between Chinese and TCGA are underlined. Transcription factors highlighted with bold text in the "Exclusively Altered Pathway" row are those first identified in the Chinese genomic data and then validated with TCGA transcriptomic data. The rest of the genes in this row are known interacting/downstream molecules that showed coaltered expression patterns to these TFs in TCGA transcriptomic data. Note that TCGA-OV only contains high-grade serous carcinoma samples, while CHI-OC have a more complicated composition, therefore the molecular profiles of the two ovarian cancer cohorts may be incomparable. Table 1. We found 3q amplifications as a signature of CC samples, which resulted in SOX2 (3q26.33), TP63 (3q28), SHOX2 (3q25.32), EAF2 (3q13.33) amplifications in CHI-CC; these genes were also proved significantly over-expressed in TCGA-CESC ( Figure 3A) as compared with TCGA-UCEC and TCGA-OV. The Sox2-p63 complex is known to promote tumor cell survival through up-regulation of GLUT1 (SLC2A1) that drives glucose influx to empower antioxidant production (18); the Sox2-p63-klf5 complex has been shown to enhance tumor growth by activation of ALDH3A1 (19). Overexpression of these effector genes within the sox2-p63 pathways in TCGA-CESC as compared with TCGA-UCEC and TCGA-OV are shown in Supplementary Figure S1. Shox2 has been reported as an epithelial-tomesenchymal transition (EMT) inducer by up-regulating transforming growth factor b receptor I (TbR-I) expression (20). Eaf2 has been shown to activate Wnt3a signaling to protect cells from oxidative stress-induced apoptosis (21). The exclusive activation of Shox2-TbR-I and Eaf2-wnt3a in TCGA-CESC are also shown in Supplementary Figure S1. Two TFs, PBXIP1 and CREB3L4 (both at 1q21.3), were found exclusively amplified and over-expressed in EC ( Figure 3B). Over-expression of PBXIP1 (HPIP) has been shown to inhibit apoptosis by up-regulating BCL2, to promote tumor cell proliferation via activation of ER, and to mediate EMT by regulating mesenchymal genes such as Ncadherin and Vimentin (22). The CREB3L4 transcription factor up-regulate the co-chaperone DNAJC12 (23), which has been proposed as a mediator of gastric cancer progression by regulating proliferation and invasion (24). Supplementary Figure S2 shows the exclusive activation of PBXIP1-regulated genes and the CREB3L4-DNAJC12 axis in TCGA-UCEC. For OC ( Figure  3C), we found exclusive amplifications and over-expression in HSF1 (8q24.3), FOXH1 (8q24.3), and ZFPM2 (8q23.1). Hsf1 is known as a master regulator in tumorigenesis that mediates cell survival and EMT via up-regulation of effector genes such as HSPA8 (hsp70), RMB23, and MTA1 (as validated in TCGA-OV, shown in Supplementary Figure S3) (25,26). Foxh1 is a binding partner of Smad2/3/4 proteins and the Foxh1-Smads complex has been an activator of the Nodal signaling pathway that is required for maintenance of pluripotency (Supplementary Figure S3) (27)(28)(29). The ZFPM2 encodes the Fog2 (Friend of Gata, 2) protein, which can interact with Gata2/4/5/6. GATA6 has been shown to up-regulate expression of genes encoding important enzymes (e.g., CYP17A1) for androgen biosynthesis (30). We found overexpression of ZFPM2, GATA6, CYP17A1, and AR (androgen receptor) by TCGA-OV, as compared with TCGA-CESC and TCGA-UCEC (Supplementary Figure S3). All other exclusively altered TFs without known target/functional information are summarized in Supplementary Figure S4.

Characteristics Shared Among Gynecologic Cancers, Validated by TCGA Data
Besides the above-mentioned common recurrent CNV events and a dMMR group found in each cancer type, comparative analysis with TCGA data revealed other commonly altered biological processes/pathways. We performed GO enrichment analysis (over-representation, FDR cutoff 0.05) on frequently mutated genes (altered in >= 5 samples) for CHI-CC, CHI-EC, CHI-OC, and their TCGA counterparts. Commonly enriched biological processes were listed in Supplementary Table S3. The large number of overlapping biological processes indicated similarity at a high level between the two populations and among cancer types. For example, genes within the PI3K-Akt-mTOR pathway were found commonly altered in each cancer type with more than 70% prevalence (Supplementary Figure S5).

Statistically Significant Prognostic Factors for Each Cancer Type
We further asked if expression level of exclusively altered genes for each cancer type is associated with survival. EAF2 was identified as a strong candidate for CC with p=0.00018, with low expression associated with poor prognosis. EAF2 has been proposed as a prognostic factor in prostate cancer (31). Interestingly, we found in the TCGA-CESC that APOBEC high/low (p=0.98) and CNV high/ low (p=0.81) alone was statistically insignificant for predicting prognosis, but become significant (p=0.017) when combined together ( Figure 5A), i.e., patients showing consistently high or low levels of APOBEC and CNV have better survival. PUF60 was one of the TFs exclusively up-regulated in OC (Supplementary Figure S4) and was found significantly associated with OC survival (p=0.043; Figure 5B). However, while a better outcome for PUF60 over-expressing OC patients was indicated in TCGA-OV transcriptomic data, others reported association of PUF60 overexpression with breast cancer progression through downregulation of PTEN (32). Further verification about the roles of PUF60 in different cancers is awaited. ESR1 and PGR were found associated with patient survival in EC ( Figure 5C), which is consistent with previous study (33).

DISCUSSION
We have performed a series of analysis to study the question of whether there are molecular characteristics shared/exclusive among gynecologic cancers, and more importantly to probe for the intrinsic causes of them. Using TCGA gynecologic cancer  data as a validation, we confirmed that there are considerable similarities and differences among CC, EC, and OC, in frequently mutated genes, recurrent CNV events, affected molecular pathways, and biological processes. The molecular similarities shared among gynecologic cancers reflect the close proximity and functional connections among them, while the differences may reflect their distinct cell types of origin (e.g., squamous cells, serous cells, glandular cells). The functions of the shared molecular features of gynecologic cancers reveal their associations with early tumor initiation. According to our results, all three gynecologic cancer types (both Chinese and TCGA) share a dMMR-signature, four recurrent CNV events, as well as the extensive alterations in PI3K-Akt-mTOR signaling and cilium component genes. It is well-established that accumulated genomic lesions caused by malfunction of DNA repair drive tumorigenesis (34), and that dMMR is viewed as a key inducer of tumor initiation. Mutant PIK3CA-induced constitutive PI3K activation has been shown to be essential for tumor initiation in mouse models of breast cancer (35) and able to dedifferentiate normal lineage-restricted cells by reactivation of multi-potency at early stage of tumor initiation (36). In line with early PI3K activation, ciliary defects have also been proposed with key roles in early stages of tumor development. Loss of primary cilia has been observed in breast pre-malignant lesions (37), and loss of motile cilia in Fallopian tube increases the exposure of epithelial cells to oxidative stress caused by follicular fluid (38). The recurrent CNV events shared by gynecologic cancers are predicted to cause amplifications of retinal proteins FAM138D and FAM138E, and deletions of genes associated with IFN-a/b signaling, which can be viewed as a Note that the TCGA-CESC study has defined the level of APOBEC mutagenesis (high/low) and CNV (high/low) level, and here we define "consistent" as consistently high or low in APOBEC and CNV, and "opposite" as inconsistent at APOBEC and CNV levels.
strategy of immune evasion. Indeed, the immune system has been proven the ability of rapid sensing of oncogene-transformed cells (39); however, instead of effective killing, the tumor-associated immune cells may become protective upon interactions with the preneoplastic cells (40). These shared molecular changes suggest a common, non-random reprogramming of cells at the early stages of tumorigenesis. The reprogramming process involves changes in specific chromosome regions, resulting in up/downregulation of genes with key roles in tumor biology, and through these alterations the preneoplastic cells become able to satisfy the minimal requirement for the establishment of a tumor. Future investigations are required to explore the potentials of the involved molecules as candidates for early biomarkers of gynecologic cancers.
Unlike the shared molecular changes that are associated with tumor initiation, the functions of exclusively altered pathways for each cancer type suggest roles in satisfying the needs for more advanced, later-stage cancer development, such as the maintenance of stemness, tumor growth, and migration. For CC, this may be at least partially powered by 3q amplifications that lead to the activation of the squamous lineage transcription factors Sox2 and p63, which are the master regulators of stem cell pluripotency (19); the activation of Shox2-TbR1 may serve as an inducer of EMT (20). There is a potential link between Sox2 amplification/over-expression and HPV-positivity in vulva cancer (41), which could also apply to the explanation of exclusive Sox2-p63 activation in CC. EC may handle these tasks by activation of PBXIP1/HPIP signalling and CREB3L4-DNAJC12 axis, while OC by Hsf1, Foxh1, and Fog2. Although different cancer types may activate/inactivate different TFs and pathways, eventually their consequences are similar (i.e., eventually achieving cell survival, proliferation, and EMT). This suggests that the downstream effector genes for various TFs could be overlapping or redundant, because they all eventually lead to cancer progression. These exclusively altered drivers indicate the existence of a cell-type specific developing trajectory, from which different types of pre-malignant cells gradually acquire cell-type specific molecular changes that eventually distinguish them (e.g., mutation load, CNV frequency, significantly altered genes, mutational signatures). The program may offer the ability of self-renewal and infinite proliferation, as well as the ability of tumor cell migration. Prognostic or diagnostic biomarker candidates for specific cancer type could be found within these exclusively altered molecules.
The molecular characteristics of Chinese gynecologic cancers can provide some implications for targeted therapies. More than 10% of the OV samples carried BRCA1/2 mutations, and some more with mutations in genes involved with homologous recombination repair, rending these patients potential sensitivity to PARP inhibitors, which is currently an available option for Chinese OV patients (42)(43)(44). Moreover, over 70% of the samples showed alterations in PI3K-Akt-mTOR signalling, which suggest great application potentials for PI3K/Akt/mTOR inhibitors.
Our results showed significant consistency with previous studies (2,4). It is important to note that our study was based on tumor-only sequencing, i.e., no matched-normal samples were used. Such a condition represents a very common situation in the clinical setting where matched-normal samples were usually not available. One may question the reliability of the detected somatic variants for each individual sample because of the lack of normal control. We were fully aware of this concern and have added many extra filters (see Materials and Methods for details) to maximally avoid false positives; small-scale mutations of HYDIN were further validated with Sanger sequencing. The principle behind our study is the assumption that false discoveries occur randomly and their effects will be diluted if the sample size is large enough, while true mutations occur specifically on particular regions that will accumulate their effects as the sample size grows. The disadvantage of single-sample sequencing is thought to be negligible when focusing only on recurrent (>5% frequency) events. Indeed, the validation by TCGA data has proven the accuracy of our Chinese cohort data at the gene/pathway/process level. More efficient analytical tools are pending for the full exploitation of the large body of tumor-only samples.
In conclusion, we present here currently the largest molecular characterization of multiple types of Chinese gynecologic cancers. Using relevant TCGA data as a validation, we identified common molecular features among gynecologic cancers, which suggest a common reprogramming process of cells in early tumor initiation. We also identified exclusively altered TFs/pathways for CC, EC, and OC, which indicate a laterstage, cell-type specific tumor development process for each cancer type. From a molecular point of view, we have provided a summary of what is shared and what is not among gynecologic cancers and have given hypotheses about the causes behind these observations. Validations of our findings require further experimental research and large-scale cohort studies including multiple gynecologic cancer types.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://bigd.big.ac.cn/ gsa-human/browse/HRA000294.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of The Sixth Affiliated Hospital of Sun Yat-sen University. The patients/participants provided their written informed consent to participate in this study.