NCAPG Is a Promising Therapeutic Target Across Different Tumor Types

Background With the advent of CRISPR-Cas9 genome editing tool in gene therapy, identification of aberrantly expressed genes is of great value across various cancer types. Since a large number of patients may benefit from molecular targeted gene therapy. The purpose of this study was to identify aberrantly expressed genes across various cancer types, analyze prospective mechanisms and their correlation with survival outcomes. Results NCAPG was highly expressed in The Cancer Genome Atlas (TCGA) database, which includes the transcriptomes of 6,647 cancer and 647 normal tissue samples from 16 cancer types. Furthermore, a predicted NCAPG overexpression rate was also observed at the protein level in 16 tumor types. Importantly, high NCAPG level was significantly associated with unfavorable survival in various cancer types such as hepatocellular carcinoma (HCC), breast, lung or ovarian cancer. The multivariate analyses demonstrated that NCAPG, TNM, and Barcelona Clinic Liver Cancer (BCLC) staging were independent risk factors for mortality of patients with HCC. Moreover, functional and pathway enrichment analysis suggested that NCAPG was closely correlated with the pathways of cell cycle, cellular senescence, and mismatch repair. By weighted gene co-expression network analysis (WGCNA), we identified NCAPG as a hub gene in the turquoise module mostly related to the survival time of HCC samples. Conclusion To our knowledge, this study represents a comprehensive RNA-Seq analysis of several tumor types, revealing NCAPG as a promising molecular target. NCAPG overexpression may play important roles in carcinogenesis and progression of tumors via regulating tumor-related pathways, thereby broadening the understanding of the pathogenic mechanisms and highlighting the possibility of developing novel targeted therapeutics.


INTRODUCTION
In personalized medicine, identification of tumor-specific target or tumor-related features to improve the therapeutic potential of tumor patients is of great importance (Andre et al., 2014). Although many molecularly targeted agents that attenuate specific oncogenic driver pathways have been developed to facilitate tumor treatment, not all patients can benefit because targeting driver pathways is not applicable for all tumor types. However, the CRISPR/Cas9 is a promising tool for molecular targeted gene therapy, which can activate or repress the expression of genes linked to multiple types of cancer (Sachdeva et al., 2015;Zhan et al., 2018). Therefore, the identification of aberrant gene expression across tumors is of great significance.
Pan cancer studies may be helpful in identifying differentially expressed genes that play a vital role in many cancer types (Cao and Zhang, 2016;Cava et al., 2018). A large sample size and a broad spectrum of cancer types will be favorable for the discovery of the aberrant expression genes across cancer types. The Cancer Genome Atlas (TCGA) has performed comprehensive pan-cancer molecular study of deregulated gene expression.
Non-SMC condensin I complex subunit G (NCAPG), a subunit of the condensin complex, is responsible for the condensation and stabilization of chromosomes during meiosis and mitosis (Murphy and Sarge, 2008). To date, progress on the role of NCAPG in tumors is still limited. NCAPG is abundantly expressed in HCC, castration-resistant prostate cancer and melanoma and it was showed that NCAPG promotes HCC proliferation and migration (Ryu et al., 2007;Arai et al., 2018;Liu et al., 2018;Zhou et al., 2018). Interestingly, NCAPG has been identified as a new therapeutic target for HCC by a genomewide CRISPR cell growth screening (Wang et al., 2017).
In this study, we analyzed a large number of RNA sequencing data containing a broad spectrum of cancer types to identify aberrantly expressed genes across tumor types. NCAPG was found to be overexpressed in multiple tumor types. Furthermore, we predicted NCAPG protein overexpression for each tumor type. To date, there are few literatures on the expression and role of NCAPG in multiple tumors. Therefore, we predicted NCAPG related pathway activity using bioinformatics tools and identified NCAPG as a hub gene in the turquoise module in HCC tissue samples by WGCNA.

Predicting Protein Overexpression of NCAPG With RNA Profiling
The percentage of samples per tumor type, including relevant subgroups, was predicted with an increased FPKM value for NCAPG, which was used as a proxy for protein overexpression. We defined the threshold as the 97.5th percentile for the FPKM values of NCAPG in the set of FPKM values of healthy tissues. For each tumor sample, NCAPG was labeled as overexpressed when the FPKM value was above the 97.5th percentile threshold as defined in the healthy tissue samples.

The Relevant Genes of NCAPG and Functional Enrichment Analysis
The genes most relevant to NCAPG (r > 0.4) across five tumor types were calculated using Linked Omics, which is a web portal for TCGA data. The online database GEPIA is a web server for analyzing the RNA sequencing expression based on TCGA database.
The functional annotation of NCAPG related genes included the Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. GO terms and pathways with a p value < 0.05 were significant. Both GO and KEGG pathway analyses were performed by R package "cluster Profiler". Enrichment maps visualizing the results were drawn by R software (http:///www.r-project.org/) and Bioconductor (http://bioconductor.org/).
A gene function prediction website (Gene MANIA: http:// genemania.org/) was applied to construct the gene-gene interaction networks.

Construction of the WGCNA Co-Expression Network
The WGCNA is used to construct the gene co-expression network and identify the co-expression modules using the WGCNA package in R language (Langfelder and Horvath, 2008). Co-expression methodology is typically applied to explore correlation between gene expression levels. Genes enriched in the same pathway tend to appear with a similar expression pattern (Saris et al., 20090). Therefore, the construction of a gene co-expression network is useful in identifying genes with similar biological functions. We deleted low-expressed genes in RNA-seq data of HCC and selected the most varying genes by using the variance threshold of S.D. > 0.75, resulting in 7,373 genes for network construction. The threshold of co-expression module was set as p < 0.05.

Identification of Hub Genes
Hub genes, which are highly interconnected with other genes in a module, have been shown to play a critical role in tumor development. The top ranked genes in every module are thought to be hub genes. To identify the high degree genes in the protein-protein network (PPI), the Cytoscape plugin cyto Hubba was applied to do the network analysis, and the high degree genes were identified.
Two independent experienced pathologists analyzed the staining of this IHC array. According to the strength and area of staining, it can be divided into different grades: corresponding to no staining, weak staining, moderately positive, strong positive, the score ranged from 0-3; similarly, the area of staining also ranged from 0-3, 0 as no area staining, one as 1%-30%, two as 30%-60%, three as over 60%. IHC score is the sum of the strength and area of staining and the highest score is 9.

Extraction of Total RNA and Quantitative Real-Time Polymerase Chain Reaction (qPCR)
Total RNA of breast cancer and MDA-MA-231 were isolated by Trizol reagents (Invitrogen, Thermo Fisher Scientific). Transcriptor First Strand cDNA Synthesis Kit (Roche Diagnostics) and SYBR Green (Roche Diagnostics) were applied for cDNA synthesis and qPCR, respectively. The sequences of primers used in this study were provided as followed: forward of NCAPG is 5'-GAAGAGGAAGATGGT GGCCT-3' and reverse is 5'-TGTTTATGAGCAGGCACACT-3'; forward of b-actin is 5'-GCACCCAGCACAATGAAGAT-3' and reverse is 5'-ACATCTGCTGGAAGGTGGAC-3'. b-actin was used as an endogenous control.

Statistical Analysis
Graphpad prism 8 was used for statistical analysis. Data were evaluated by t-test and presented as mean ± standard deviation (x ± s). P < 0.05 was considered to be statistically significant.

Predicted Protein Overexpression of NCAPG by RNA-Seq
The RNA sequencing expression data of 4,262 tumors and 2,998 normal samples representing 12 tumor types from the TCGA and GTEx projects were analyzed via GEPIA. A total of 15 genes were identified to be significantly deregulated among 12 tumor types while the top 500 most differential genes of every tumor type were selected, including NCAPG, ASF1B, BIRC5, CCNB1, CDC20, CENPF, FOXM1, KIAA0101, KIF20A, PBK, PTTG1, RRM2, TK1, TPX2, UBE2C ( Figure 1A, Supplementary Figure  1). Moreover, these 15 genes were subjected to cluster Profiler for GO analysis. GO term annotation showed that these genes correlated with mitotic nuclear division, nuclear division and organelle fission (BP), spindle, spindle pole, and condensed chromosome centromeric region (CC) ( Figure 1B). Then, protein co-expression network analysis was carried out and 13 key genes from the most network binding nodes were found, including NCAPG, BIRC5, CCNB1, CDC20, CENPF, FOXM1, KIAA0101, KIF20A, PBK, PTTG1, RRM2, TPX2, and UBE2C ( Figure 1C). In addition, Genemania analysis also revealed these 15 genes were enriched in mitosis, nuclear division, organelle fission, condensed chromosome, and spindle signaling pathway ( Figure 1D), thus indicating involvement in cancer progression. Therefore, we focused on NCAPG which highly correlated with condensed chromosome and mitosis. NCAPG was upregulated in multiple solid malignancies but it was downregulated in hematologic malignancies such as lymphoblastic acute myeloid leukemia (LAML) (Figures 2A, B). The median number of tumor samples analyzed per tumor type was 337.5 (interquartile range, 145.25 to 466.5), ranging from 57 in UCS to 1,085 in breast cancer (BRCA). By analyzing 7,294 RNA-sequencing data of 16 tumor types from the Cancer Genome Atlas (TCGA), we observed that a predicted NCAPG overexpression rate was in 74.26% of samples for HCC, 68.96% for breast cancer (BRCA), 73.10% for lung adenocarcinoma (LUAD), 92.81% for lung squamous cell carcinoma (LUSC), 50.11% for colon adenocarcinoma (COAD), 40.93% for bladder urothelial carcinoma (BLCA), 66.2% for head, neck squamous cell carcinoma (HNSC), 65.06% for rectum adenocarcinoma (READ), and 41.18% for kidney renal papillary cell carcinoma(TCGA-KIRP) ( Figure 3 and Supplementary Table 1). The other tumor types were excluded owing to the small sample size of normal tissues.

High NCAPG Was a Prognostic Marker in Several Types of Tumor
Representative staining patterns of NCAPG expression in HCC, breast cancer, lung cancer, and ovarian cancer by IHC were  Furthermore, we validated the expression pattern of NCAPG with microarray data retrieved from GEO (GSE14520, n=471, Figure 5A). To analyze the association of NCAPG with clinical and pathological features on 221 clinical cases, X-tile software was used to determine the optimal cut-off values as previously described (Jie et al., 2019). Patients were divided into low group (NCAPG ≤ 5.1) and high group (NCAPG >5.1) for further analysis (GSE14520, n=221). The expression of NCAPG seemed to differ substantially between some indicated features, including cirrhosis, TNM staging, AFP level (Supplementary Table 2). The overall survival rate was significantly lower in patients with high NCAPG expression than in those with low NCAPG expression (p < 0.001; Figure 5B). Next, Cox proportional hazards regression analysis was performed to validate whether the NCAPG was confounded by underlying clinical conditions. A univariate analysis revealed that the NCAPG was a significant predictor of overall survival (p = 0.01, Table 1). The multivariate analyses demonstrated that NCAPG, TNM, and BCLC were independent risk factors for mortality of patients with HCC (p = 0.011, Table 1).

Relevant Genes of NCAPG and Functional Annotation of NCAPG
To investigate the function of NCAPG in tumors, we first identified genes most relevant to NCAPG (r > 0.4) across five types of tumor using Linked Omics. The common relevant genes of NCAPG were analyzed based upon 3,091 tumor samples, including HCC (n=377), breast cancer (n=1,097), lung adenocarcinoma (n=522), lung squamous cell carcinoma (n=504), and ovarian cancer (n=591) expression profiling datasets, respectively ( Figure 6A). Then these 215 relevant genes were subjected to cluster Profiler for GO and KEGG FIGURE 3 | NCAPG overexpression rates in many tumor types NCAPG overexpression rates in tumor types were determined with functional RNA sequencing data according to TCGA database. The x-axis shows the percentage of samples with overexpression of NCAPG. The percentage of samples per tumor type was predicted with an increased FPKM value for NCAPG, which was used as a proxy for protein overexpression. The threshold as the 97.5th percentile for the FPKM values of NCAPG was defined in the set of FPKM values of healthy tissues. For each tumor sample, NCAPG was labeled as overexpressed when the FPKM value was above the 97.5th percentile threshold as defined in the healthy tissue samples. pathway analysis. GO categories enrichment analysis showed that these genes were mainly enriched in ATPase activity, tubulin binding and catalytic activity, acting on DNA (MF), chromosome segregation, nuclear division and organelle fission (BP), chromosomal region, condensed chromosome and spindle (CC) (Figures 6B-D). Furthermore, the KEGG pathway annotation revealed that cell cycle, oocyte meiosis, DNA replication, cellular senescence and mismatch repair were the most significantly enriched pathways ( Figure 6E).

Construction of Weighted Gene Co-Expression Modules
Two hundred and sixty-eight HCC samples with clinical data were obtained in co-expression analysis via WGCNA (Supplementary Figure 2). In this study, we have chosen the soft threshold power of b = 6 to ensure a scale-free network (Supplementary Figure 3). A total of 25 modules was identified, and the connectivity of eigengenes was analyzed ( Figures 7A, B). Moreover, there were multiple modules related to one or more clinical traits, such as tumor stages, overall survival time, and gender. As shown in Figure 7C, the light yellow, midnight blue, dark green, green yellow, yellow, dark red, grey60, royal blue, purple, turquoise, black, and blue modules were related to three tumor stages; dark green, and green yellow modules were negatively related to the overall survival time, and turquoise module slightly exceeded significance level; magenta, salmon, dark turquoise, yellow, turquoise, blue, and grey modules were related to gender. Taking into account gender disparity in HCC, we selected the turquoise module which related to different HCC stages, shorter overall survival time, gender for further analysis. Then these 1,576 genes in turquoise module were conducted to clusterProfiler for GO and KEGG pathway analysis (Supplementary Figure 4). Consistent with the results mentioned above, cell cycle, oocyte meiosis, DNA replication  and mismatch repair were the most significantly enriched pathways by KEGG pathway annotation.
To explore the interaction between the 1,576 genes in the turquoise module, PPI network was explored and visualized by Cytoscape (Supplementary Figure 5). In the PPI network, genes with a connectivity degree of ≥ 10 were also defined hub genes. The degree of connectivity to NCAPG was 727. Furthermore, the high degree genes were calculated by the cytohubba plugin. The co-expression network of top 60 ranked genes for the turquoise module was constructed as shown in Figure 7D. Importantly, NCAPG was identified as the hub gene in the turquoise module.

NCAPG Was Upregulated in Breast Cancer and Affected Cell Proliferation
To verify the expression of NCAPG in tumors, we extracted mRNA from 12 pairs of breast cancer and the normal tissue adjacent to the cancer to verify the expression of NCAPG. The qPCR results showed NCAPG was upregulated in breast cancer ( Figure 8A), which was consistent with analysis of TCGA data. Additionally, we found that the protein of NCAPG was upregulated in breast cancer ( Figure 8B left panel). Statistical results of 35 pairs of breast cancer tissue microarray staining showed that NCAPG was upregulated in tumor tissues ( Figure  8B right). To investigate whether high expression NCAPG plays a crucial role in breast cancer, we suppressed NCAPG expression by siRNAs in MDA-MB-231. qPCR was used to detect the inhibition effi ciency of siNCAPG1 and siNCAPG2.The results showed that NCAPG was greatly decreased by the siRNAs of NCAPG ( Figure 8C). Then we performed colony formation assay and found that inhibition of NCAPG expression the number of clones in MDA-MB-231 cells was reduced ( Figure 8D). Moreover, EdU assay showed that inhibiting NCAPG could significantly reduce MDA-MB-231 cells proliferation ( Figure 8E). Furthermore, knockdown of NCAPG increased cleaved-PARP protein level but decreased phosphorylated levels of retinoblastoma protein (pRb) and cyclin B1 protein ( Figure 8F).

DISCUSSION
We performed a comprehensive analysis of the TCGA tumor database based on 7,294 clinical samples and identified overexpression of NCAPG across tumor types. Next, we predicted NCAPG overexpression rate in multiple tumor types and found NCAPG was consistently unregulated. Furthermore, NCAPG was a prognostic marker in several types of tumor, including HCC, breast cancer, lung cancer, and ovarian cancer. More importantly, the multivariate analyses demonstrated that NCAPG, TNM and BCLC staging were independent risk factors for mortality in patients with HCC with GEO dataset. In addition, using bioinformatics analysis, we identified that NCAPG correlated with ATPase activity, tubulin binding and catalytic activity, acting on DNA (MF), chromosome segregation, nuclear division and organelle fission (BP), chromosomal region, condensed chromosome and spindle (CC) and was involved in cancer-related signaling pathways, cell cycle, DNA replication and mismatch repair across different cancer types. Moreover, NCAPG was identified as a hub gene in HCC samples using WGCNA. Moreover, consistent with analysis of TCGA data, both the mRNA and protein levels of NCAPG were upregulated in breast cancer tissues compared with adjacent tissues. Furthermore, knockdown of NCAPG by siRNA significantly decreased cell proliferation in breast cancer cell line MDA-MB-231.
In our pan-cancer analysis, in high-incidence tumor types such as BRCA, LUAD and LUSC, more than 50% of samples had overexpressed NCAPG expression. Emphasis on upregulated NCAPG can be a promising new strategy for personalized treatment. More importantly, the CRISPR/Cas9 is a promising tool for molecular targeted gene therapy, which can activate or repress the expression of genes linked to multiple types of cancer. The work of Yu Wang et al. shows that NCAPG is a true target identified by CRISPR for HCC tumor cell growth (Wang et al., 2019). This approach might pave the way for molecular targeted gene therapy in the near future to treat tumors with specific features, such as NCAPG overexpression, irrespective of their origin, and location. Because elevated NCAPG level was a regular event in the large set of tumors we analyzed, targeted treatment options might be available for NCAPG. This means that a large number of patients could possibly benefit from targeted inhibition of NCAPG. We believe that the comprehensive pancancer molecular study of a very large set of samples representing many tumor types is worthy of further investigation and may provide better treatment options for a large number of patients.
Cell cycle and DNA damage response pathway are frequently mutated in cancer (Kastan and Bartek, 2004;Strzyz, 2016). For the first time, the US Food and Drug Administration approved a cancer drug for treatment based on a tumor biomarker and not the tumor's original location: pembrolizumab, indicated for treatment of mismatch repair deficient, or microsatellite instability-high advanced solid tumors (Boyiadzis et al., 2018). It has been reported that the knockdown of NCAPG expression could not only reduce HCC cell viability, but also induce apoptosis and arrest the cells at the S phase of the cell cycle by regulating the expression of Bax, cleaved caspase-3, E-cadherin, cyclin A1, CDK2, Bcl-2, N-cadherin, and HOXB9 (Wang et al., 2019). Downregulation of NCAPG by miR-99a-3p inhibits cancer cell aggressiveness via decreasing cell proliferation, migration, and invasion in castration-resistant prostate cancer (Arai et al., 2018). Consistent with these findings, our research also showed that NCAPG might regulate cancer-related signaling pathways, cell cycle, DNA replication, and mismatch repair across cancer types. Therefore, NCAPG is a promising target for cancer therapy across cancer types.
NCAPG might promote tumor development by dysregulating the cell cycle, mismatch repair and cellular senescence. As a result, we suggest that the overexpression of NCAPG across tumor types is associated with poor prognosis by regulating the above-mentioned pathways. According to the findings above, it can be concluded that NCAPG may be involved in carcinogenesis and tumor growth across different cancer types.
There are some drawbacks concerning RNA sequencing expression data, which is used as a screening tool to evaluate differential gene expression across a very large number of samples consisting of several tumor types. Expression of a gene can be affected at many levels, including mRNA stability, translation, and post-transcriptional control such as miRNAmediated regulation of mRNA stability (du Toit, 2016;Franks et al., 2017). However, RNA sequencing data does shed light on answering questions concerning NCAPG overexpression across tumors through a more efficient method rather than large-scale lmmunohistochemistry (IHC) analyses, which is most widely used in the clinic to determine protein expression. Hence, subsequent IHC validation might be needed.
In conclusion, analysis of larger sample sizes across tumor types will also enable the scientists to identify the deregulated genes that are important in driving cancer. This integrative analysis has identified aberrantly expressed NCAPG across the 16 tumor types and identified specific signaling pathways regulated by NCAPG. Patients with high NCAPG levels in several cancer types correlated with short survival time through regulating cancer related pathways. Moreover, NCAPG was a hub gene in the turquoise module in HCC tissue.

CONCLUSIONS
In conclusion, this study represents a comprehensive RNA-Seq analysis of several types of tumor revealing NCAPG as a promising molecular target and NCAPG overexpression may play important roles in carcinogenesis and progression of tumors through regulating tumor-related pathways, including cell cycle, cellular senescence, and mismatch repair. Moreover, high NCAPG level significantly correlated with poor survival in patients with several types of cancer, including HCC, breast cancer, lung cancer, and ovarian cancer. The multivariate analyses demonstrated that NCAPG, TNM, and BCLC staging were independent risk factors for mortality of patients with HCC with GEO dataset.

ETHICS STATEMENT
This study was approved by the Institute Research Ethics Committee of the Third affiliated hospital of Sun Yat-sen University and was performed in accordance with the approved guidelines.

AUTHOR CONTRIBUTIONS
YC, BH, and QZ designed the research and analyzed the data. CX, JG, YJ, and RL analyzed data and wrote the paper. JC and ZC reviewed the clinical information. All authors approved the final version.