Four Novel Prognostic Genes Related to Prostate Cancer Identified Using Co-expression Structure Network Analysis

Prostate cancer (PCa) is one of the most common malignancies for males, but very little is known about its pathogenesis. This study aimed to identify novel biomarkers associated with PCa prognosis and elucidate the underlying molecular mechanism. First, The Cancer Genome Atlas (TCGA) RNA-sequencing data were utilized to identify differentially expressed genes (DEGs) between tumor and normal samples. The DEGs were then applied to construct a co-expression and mined using structure network analysis. The magenta module that was highly related to the Gleason score (r = 0.46, p = 3e–26) and tumor stage (r = 0.38, p = 2e–17) was screened. Subsequently, all genes of the magenta module underwent function annotation. From the key module, CCNA2, CKAP2L, NCAPG, and NUSAP1 were chosen as the four candidate genes. Finally, internal (TCGA) and external data sets (GSE32571, GSE70770, and GSE141551) were combined to validate and predict the value of real hub genes. The results show that the above genes are up-regulated in PCa samples, and higher expression levels show significant association with higher Gleason scores and tumor T stage. Moreover, receiver operating characteristic curve and survival analysis validate the excellent value of hub genes in PCa progression and prognosis. In addition, the protein levels of these four genes also remain higher in tumor tissues when compared with normal tissues. Gene set enrichment analysis and gene set variation analysis for a single gene reveal the close relation with cell proliferation. Meanwhile, 11 small molecular drugs that have the potential to treat PCa were also screened. In conclusion, our research identified four potential prognostic genes and several candidate molecular drugs for treating PCa.


INTRODUCTION
Prostate cancer (PCa) is the second most frequent malignancy and the fifth leading cause of death in males throughout the world (Bray et al., 2018). Currently, prostate biopsy has become the standard for diagnosing PCa worldwide. Meanwhile, prostate-specific antigens (PSA) are considered a reliable prostate tumor marker, especially in the early stages of PCa. Despite the wide use of PSA tests in screening for PCa, this approach has some restrictions. In several non-malignant cases, such as those with prostatitis and benign prostatic hyperplasia, serum PSA frequently increases, affecting the accuracy of the PSA test (McDonald et al., 2014). In addition, many patients with widespread metastases from PCa showed poor differentiation or have neuroendocrinal differentiation on histology with typically low PSA levels (Caram et al., 2016). Moreover, physicians and patients often overestimate the abilities of PSA testing, which leads to overdiagnosis and overtreatment of indolent PCa (Ciatto et al., 2000;Dall'Era et al., 2008;Bryant et al., 2012). Furthermore, there are no clear serum PSA levels that assist in assessing a patient with PCa. On the other hand, it is well known that PCa is a serious threat to the health of males, especially those with advanced stage, drug resistance, neoplasm recurrence, and tumor metastasis, always contributing to death even after combined treatment. Therefore, a novel and specific biomarker for PCa needs to be explored.
Since the advent of microarray and high-throughput sequencing technology, bioinformatics has played a significant role in many fields, especially in the medical field (Gu and Chen, 2014;Li et al., 2014;Huang and Huang, 2015;Turei et al., 2015). In recent years, more potential biomarkers have been discovered. However, the vast majority of studies focus only on the differences in expression between different samples, and interactions among the genes have been largely neglected (Song et al., 2018;Pudova et al., 2019;Zheng et al., 2019).
To further investigate the underlying connection and relative importance of each gene, we applied structure network algorithms, which identify function-specific modules based on network topological importance (Bu et al., 2020). A weighted gene correlation network analysis (WGCNA) was constructed, and genes with similar expression profiles were clustered into the same module. Then, the correlation between the module and clinical phenotype was analyzed to choose the module that is most significantly related to the clinical disease phenotype. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of genes within the key module were conducted to explore the potential functions. After a series of screenings, four hub genes (CCNA2, CKAP2L, NCAPG, and NUSAP1) that could truly predict the progression and prognosis of PCa were found in our study. Gene set enrichment analysis (GSEA), and gene set variation analysis (GSVA) were used to investigate potential biological functions. Meanwhile, a variety of databases, such as Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA), and HPA, were utilized to verify the genes in different methods, such as survival analysis or receiver operating characteristic (ROC) curve. Also, TIMER was applied to explore the immune infiltration of the hub genes. In addition, cBioPortal was used to assess genetic alterations of those four genes. Finally, the Connectivity map (CMap) was used to further screen the correlation of small molecule drug targets.

Data Acquisition and Study Design
The RNA-sequencing (RNA-seq) and clinical information for PCa were acquired from the TCGA database 1 , which included 498 PCa samples and 52 normal prostate samples. The data sets of TCGA were used to screen differentially expressed genes (DEGs); perform WGCNA; verify the hub genes; and perform GSEA, GSVA, and survival analysis in our study. Simultaneously, all microarray data sets and other information were downloaded from the GEO database of the NCBI databases 2 . The data set GSE32571, which was obtained from the Affymetrix Human Gene 1.0 ST, was used to verify the expression of hub genes between normal and tumor samples. The GSE70770 was used for expression profiling based on GPL10558 (Illumina HumanHT-12 V4.0) and comprised 207 tumor samples. The data set was utilized to explore the expression of hub genes with different Gleason scores and stages. Finally, another data set of GSE141551, which was obtained on the Illumina Human HT-12 WG-DASL V4.0, was used as a testing set to further verify our results. The cohorts that did not undergo WGCNA analysis were selected as internal training validation sets, and the other data sets that have undergone WGCNA analysis were used as external validation data sets. Therefore, TCGA was chosen as the training and internal validation data sets, whereas GSE32571, GSE70770, and GSE141551 were used as external validation data sets. Detailed information on these data sets is shown in Table 1, and the workflow of our research is presented in Figure 1.

Data Preprocessing and DEG Screening
All microarray data of GEO data sets were subjected to quality control, background correction, logarithmic conversion, and removal of batch effects processing, using the R package "limma" (Ritchie et al., 2015). The samples without clinical data were then excluded from subsequent analysis. The RNA-seq data of TCGA were normalized with the "DESeq2" R package. The three R packages-DESeq2, analyzed based on a negative binomial distribution method (Love et al., 2014); limma, based on linear models and empirical Bayes methods (Ritchie et al., 2015); and edgeR, based on an overdispersed Poisson model (Robinson et al., 2009)-were then utilized to screen the DEGs between normal and cancer samples. Finally, the overlapping DEGs by adjusting P < 0.05 were considered as target genes and further analyzed.

WGCNA
The DEGs were constructed based on a weighted gene coexpression using the R package "WGCNA" (Langfelder and Horvath, 2008). First, the function "goodSamplesGenes" in the "WGCNA" package was used in maintaining the RNA-seq data from overlapping DEGs if they proved to be good samples. Then, the outlier samples were removed to ensure that the results of network construction are more reliable. Second, a matrix of similarity was constructed by Pearson's correlation analysis of all pairs of genes. After that, the matrix was performed to construct a scale-free co-expression network based on an optimal soft threshold power β (Chen et al., 2018). A similar matrix was then transformed into a topological overlap matrix (TOM). This TOM could measure the network connectivity of a gene, which was defined as the sum of its adjacency to all other networkgenerated genes (Botia et al., 2017). At the same time, the average linkage of hierarchical clustering was analyzed by the TOM-based

Identify Significant Relevant Module and Module Functional Annotation
The correlation between the modules and the clinical phenotypes was analyzed by the module-trait relationship analysis of WGCNA. Then, the module that is most relevant to the clinical phenotype was found. Here, the magenta module that significantly connects with the Gleason score of PCa was chosen. The "clusterprofiler" (Yu et al., 2012) package in R was used to perform GO function annotation and KEGG pathway enrichment analysis and visualized by the R package "GOplot" (Walter et al., 2015).

Screen and Validation of Hub Genes
After choosing the interesting module, gene significance (GS) of >0.3 and module membership (MM) of >0.9 as the threshold for screening hub genes in the magenta module were set. The data sets TCGA were set as internal validation data sets, and the data sets GSE70770 and GSE141551 were selected as external validation data sets. All these were used to analyze the expression differences of hub genes with different Gleason scores and tumor stages. In addition, the expression of hub genes between PCa and adjacent tissues were confirmed by the data set GSE32571. One-way analysis of variance (ANOVA) or Student's t-test was applied to measure the statistical significance of the calculated results. Furthermore, survival analysis was conducted for hub genes using R packages "survminer" and "survival." To assess the diagnostic values of these genes, ROC curves were plotted and the area under the ROC curve (AUC) with the "pROC" R package was calculated (Robin et al., 2011).

GSEA and GSVA
The R package "clusterprofiler" (Yu et al., 2012) was utilized to perform GSEA analysis of hub genes with TCGA-PRAD RNAseq data. Moreover, the "GSVA" (Hänzelmann and Guinney, 2013) R package was used to find the pathways that are mostly associated with the hub genes. In this analysis, 498 PCa samples were divided into two groups (high vs. low expression) based on the median expression of each hub gene. P < 0.01 and log fold change of >0.15 were considered significant. The reference gene sets of GSEA and GSVA, which was called as "c2.cp.kegg.v6.2.symbols.gmt, " were downloaded from the Molecular Signatures Database 3 .

Analysis of Tumor-Infiltrating Immune Cells
TIMER 4  is an online tool that provides a comprehensive resource for systematical analysis of immune infiltrates across different types of cancers. We herein chose six tumor-infiltrating immune cells (B cells, CD4 + T cells, CD8 + T cells, macrophages, neutrophils, and dendritic cells) to investigate their correlation with the expression of selected hub genes.

Genetic Alteration of Hub Genes
The cBioPortal for cancer genomics 5 , which is an open cancer genomics platform, analyzes, visualizes, and provides the service of downloading the data from multidimensional cancer genomics data sets (Cerami et al., 2012). The users can explore the genetic changes in different samples and genes.
Here, the cBioPortal was utilized to investigate genetic alterations associated with hub genes.

Related Small Molecule Drugs Screening
The CMap 6 is an open resource that utilizes a variety of gene expression profiles for connecting small molecules, genes, and diseases (Lamb et al., 2006;Subramanian et al., 2017). First, small molecule drugs were used to process the human cells. After that, differential expression of genes in the magenta module was used to screen some molecule drugs that show high correlation with the disease. Finally, the enrichment score of each molecule drug was calculated, which ranged from -1 to 1. Also, a positive connective score suggests that a drug could induce signaling biology in a specific disease. In contrast, a negative connective score indicates that a drug could prevent signaling biology. To further investigate the association between small molecules and hub genes, a molecular docking simulation was performed on each drug dock with hub genes using Sybyl-X 2.1 software. It could help users to identify the molecule drugs against PCa.

Screening of DEGs
The RNA-seq data of 498 tumor samples and 52 normal samples obtained from the TCGA data set underwent DEG analysis by three algorithms: limma (Ritchie et al., 2015), DESeq2 (Love et al., 2014), and edgeR (Robinson et al., 2009). Using the cutoff criteria of adj.p.value of <0.05, 11,851 DEGs were identified by limma, which included 6484 up-regulated DEGs and 5,367 downregulated DEGs; 12,248 DEGs were screened by DESeq2, in which 6,548 were up-regulated and 5,700 were down-regulated; and 11,898 DEGs were selected by degeR, in which 6,659 DGEs were up-regulated and 5,239 DEGs were down-regulated. After that, 10,455 overlapping DEGs were chosen for further analysis by the above algorithms, which included 5,733 up-regulated DEGs and 4,722 down-regulated DEGs. A Venn diagram of DEGs is shown in Figure 2.

WGCNA Analysis and Identification of Key Module
We herein constructed the weighted co-expression network by the "WGCNA" R package. The height of the sample clustering was defined as 108, when the 14 outlier samples were excluded from further analysis (Supplementary Figure 1A). Clinical sample information of data sets TCGA-PRAD, such as age, Gleason score, and TNM stage, were added below the resulting tree ( Figure 3A). In the present study, the power of β was set as 8 [scale-free R 2 = 0.87 (Supplementary Figure 1B)] to ensure a scale-free network ( Figure 3B).  Figure 3F). Therefore, the magenta module was selected for further analysis.

Magenta Module Function Annotation
The "clusterprofiler" R package was used to conduct GO and KEGG analyses to investigate the function of the magenta module. The GO analysis demonstrated that the biological process (BP) of the magenta module showed large correlation with "chromosome segregation, " "nuclear division, " "organelle fission, " "nuclear chromosome segregation, " and "mitotic nuclear division" (Figure 4A). The cellular component (CC) of the magenta module showed significant association with "chromosomal region"; "chromosome, centromeric"; "condensed chromosome, centromeric region"; "condensed chromosome"; and "kinetochore" (Figure 4B). The molecular function (MF) of the magenta module was mainly related to "catalytic activity, acting on DNA"; "DNA helicase activity"; "DNA-dependent ATPase activity"; "helicase activity"; and "single-stranded DNAdependent ATP-dependent DNA helicase activity" (Figure 4C).  In addition, the KEGG analysis revealed that "cell cycle, " "DNA replication, " "oocyte meiosis, " "fanconi anemia pathway, " and "homologous recombination" were mainly enriched in the magenta module ( Figure 4D). This study illustrates that the genes within the magenta module were mainly involved in "cell cycle, " "DNA replication, " and "nuclear division."

Validation of the Expression of Real Hub Genes
Here, among the 41 hub genes screened above, four genes (CCNA2, CKAP2L, NCAPG, and NUSAP1) were selected as our target hub genes and were seldom reported in PCa. After that, these genes were validated by the internal (TCGA) and external validation data sets (GSE32571, GSE70770, and GSE141551), respectively. First, the expression of these selected genes was verified between PCa samples and normal samples. Based on the TCGA data set, those four genes showed higher expression in the PCa tissues than normal tissues (Figure 5A), and the result is consistent with that in the validation data set (GSE32571; Figure 5B). Second, these genes were part of the magenta module, which showed significant association with the Gleason scores and the T stages of PCa. Thus, CCNA2, CKAP2L, NCAPG, and NUSAP1 were highly differentially expressed in PCa samples with different Gleason scores and T stages no matter whether in internal or external validation data sets.
These results indicate that higher expression of hub genes showed correlation with higher Gleason scores and advanced T stages (Figures 5C-G). ANOVA and Student's t-test were used to measure statistical significance of the calculated results.
As for the prognosis, the survival analysis in the TCGA-PRAD data set was carried out to observe the disease-free survival of those four genes. This demonstrates that higher expression levels of those genes indicates poor disease-free survival ( Figure 6A). The results are also consistent with that of the data set GSE70770 ( Figure 6B). Furthermore, the ROC curve exhibited the excellent diagnostic and prognostic value of real hub genes in distinguishing between PCa and normal tissues ( Figure 6C). In addition, immunohistochemistry staining based on the Human Protein Atlas database revealed that the protein levels of these four genes were significantly higher in tumor tissues compared with normal tissues. Similarly, compared with high-grade prostate tissues, the protein levels of hub genes were also significantly higher than low-grade prostate tissues ( Supplementary Figures 2A-L).

GSEA and GSVA Reveal the Function of Hub Genes
Gene set enrichment analysis and GSVA analyses were conducted to further explore the potential functions of CCNA2, CKAP2L, NCAPG, and NUSAP1. As shown in Figures 7A-D, genes in high expression groups of CCNA2, CKAP2L, NCAPG, and NUSAP1 show enrichment in "cell cycle, " "DNA replication, " "homologous recombination, " and "mismatch repair" pathways. Subsequently, our analysis in the GSE32571 data set also produced similar results (Supplementary Tables 1-8). This evidence confirms that these real genes were highly connected with carcinoma proliferation. Furthermore, by applying GSEA analysis on the TCGA data, the previously reported "cell cycle, " "DNA replication, " "homologous recombination, " and "mismatch repair" pathways also exhibited a higher enrichment score in the high expression groups of these genes, further indicating their relationship with activation of proliferative processes (Figures 7E-H).

Analysis of Tumor Purity and Immune Infiltration
The TIMER was used to assess immune cell infiltration levels of each sample based on the expression of hub genes. The expression of CCNA2 and CKAP2L showed positive association with infiltrating levels of B cells, CD8 + T cells, CD4 + T cells, neutrophils, macrophages, and dendritic cells, but were weakly positive with tumor purity (Figures 8A,B). Then, NCAPG expression showed a positive correlation with tumor purity and infiltration of B cells, neutrophils, and dendritic cells, and then, no or weak associations were observed between NCAPG and infiltration of CD8 + T cells, CD4 + T cells, and macrophages ( Figure 8C). After that, NUSAP1 showed a positive relation with tumor purity and B cells, but the CD8 + T cells, neutrophils, macrophages, and dendritic cells showed no correlation with CD4 + T cells ( Figure 8D). These results uncovered that these four genes were closely associated with the immune infiltration process of PCa, which might be a reason for them to become valid prognostic markers. Interestingly, in most of the samples, the tumor purity of these real hub genes was more than 0.5 (Figures 8A-D). This implies that the genes were mainly expressed in the tumor cells, proving the clinical diagnostic value of CCNA2, CKAP2L, NCAPG, and NUSAP1 again.

Hub Genes Alterations in PCa
The PCa patient data in the TCGA database based on the cBioPortal database were utilized to analyze the alteration of the four key genes. As shown in Supplementary Figure 3A, CCNA2, CKAP2L, NCAPG, and NUSAP1 showed alteration of 6, 4, 4, and 6%, respectively, with mRNA high and deep deletion as the main types. Furthermore, among all the selected patients (499), four hub genes were altered in 60 (12%; Supplementary  Figures 3A,B).

Identification of Potentially Small Molecules
To identify candidate small molecules of PCa, the CMap database was used to screen out some small molecule drugs. According to the analysis of DEGs in the magenta module between tumor and normal samples, relevant small molecule drugs with high connection were identified. We herein set the criteria as number of instances (n > 10) and P value of <0.05, and then, 11 small molecule drugs were filtered (listed in Table 2). Both of them showed a negative correlation and had the potential to reverse the status of PCa. This analysis provides new ideas for the treatment of PCa. One-way analysis of variance (ANOVA) and Student's t-test were utilized to calculate statistical differences in these data sets.

Correlation Between Hub Genes and Compounds via Molecular Docking
A molecular docking study was carried out to investigate the association between molecules and hub genes. Using the SYBYL-X 2.1.1 software, the top 11 small molecule drugs identified by CMap analysis were docked with hub genes to obtain docking scores (Tables 3-6). Among the 11 top compounds, 15-delta prostaglandin J2 had the highest binding affinity to the four hub genes according to total score.

DISCUSSION
Prostate cancer is a highly malignant tumor with complex pathogenesis. Recently, accumulated evidence has attempted to identify hub genes that play a significant role in the development and metastasis of PCa by virtue of microarray and RNA-seq (Varambally et al., 2005;Xu et al., 2018;Laufer-Amorim et al., 2019;Ma et al., 2019). However, to the best of our knowledge, few studies address interactions between genetic data and interrelated clinical information. In the current study, gene expression data sets and clinical data from the TCGA and GEO databases were used. Next, WGCNA was used to investigate gene co-expression modules that are correlated with the progression of PCa. After a range of rigorous screening, 41 hub genes (ANLN, ASF1B,  AURKA, BUB1, CCNA2, CDC25C, CDCA5, CDCA8, CDK1,  CDKN3, CENPA, CENPF, CENPI, CEP55, CKAP2L, DLGAP5,  ERCC6L, ESPL1, EXO1, GTSE1, IQGAP3, KIF18B, KIF20A,  KIF23, KIF2C, KIF4A, KIFC1, MELK, NCAPG, NEIL3, NEK2,  NUF2, NUSAP1, PLK1, POLQ, RACGAP1, SGOL1, SKA3, SPAG5, TOP2A, and TPX2) that have a close relationship with Gleason scores and T stage were ultimately obtained. Most of these have been reported several times in PCa (Guo et al., 2006;Tamura et al., 2007;Vainio et al., 2012;Huang et al., 2017;Wilson et al., 2018). We herein selected four genes that are seldomly noticed in PCa (CCNA2, CKAP2L, NCAPG, and NUSAP1) as our target genes to further explore their function and value. CCNA2, which is also known as cyclin A2, is a member of the cyclin family. CCNA2 is reported to regulate the cell cycle by binding to CDK or CDK2 to affect the G1/S phase and G2/M phase, respectively (Pagano et al., 1992). Published literature indicates that CCNA2 is overexpressed in multiple tumors, including breast, colorectal, gastric, pancreatic, and lung cancers (Gao et al., 2014;Gan et al., 2018;Zhang et al., 2018;Dong et al., 2019;. As such, those cancers also showed significant association with histological grade, tumor stage, disease-free survival, and overall survival. In estrogen receptor + breast cancer, Gao et al. (2014) also found high levels of CCNA2, which showed correlation with tamoxifen treatment failure, in which it not only can be used as an independent prognostic factor, but also contributed to the monitoring of tamoxifen efficacy. Our results reveal that CCNA2 is up-regulated in PCa tissues when compared with normal tissues and that its expression is closely connected with Gleason scores, tumor stage in the TCGA database, and GEO data sets. Our study further corroborated Yang et al. (2020)'s research. All these suggest that CCNA2 plays an important role in PCa proliferation.
CKAP2L is a microtubule-associated protein that occurs during the mitotic phase and is involved in neural progenitor cell division (Yumoto et al., 2013). Specifically speaking, downregulation of CKAP2L gives rise to separation between the multipolar spindles and the chromosome in the neural progenitor cells. In addition, Jakobsen et al. (2011) identified CKAP2L as part and parcel of the centrosome situated in the spindle, i.e., in the midbody and the spindle pole. According to the published literature, little is known about the role of CKAP2L in tumor development and proliferation. A recent paper reports high expression of CKAP2L, which induces the invasion of lung adenocarcinoma by the MAPK signaling pathway, and shows correlation with poor prognosis (Xiong et al., 2019). Furthermore, CKAP2 acts as a significant paralog of CKAP2L, and the oncogenic nature and overexpression of this gene is revealed in PCa (Yu et al., 2015), ovarian cancer (Gao et al., 2017), and glioma . In this regard, as a novel mitotic spindle protein, we speculate that it might regulate cancer progression by taking part in the polymerization of microtubules and then affecting cell mitosis.
NCAPG is a key component of the condensin complex and is highly correlated with the condensation and stabilization of chromosomes during mitosis and meiosis (Murphy and Sarge, 2008). It is located on human chromosome band 4p15.32 and encoded by the NY-MEL-3 gene (Jäger et al., 2000). To date, previous study shows the involvement of NCAPG in hepatocellular carcinoma and breast, lung, and ovarian cancers (Cao and Zhang, 2016;Cava et al., 2018). In hepatocellular carcinoma, Wang  verifies that depletion of NCAPG contributes to hepatocellular carcinoma cell cycle arrest at the S phase and induces apoptosis. In addition, they also find that NCAPG could serve as a promoter of invasion and metastasis of liver cancer. In PCa, our study demonstrates that the expression of NCAPG shows significant association with tumor stage and disease-free survival, and this is in agreement with the reports put forwarded by Arai (Arai et al., 2018) in castration-resistant PCa clinical specimens.
As a cell-cycle related protein, NUSAP1 plays an important role in mitotic progression, spindle formation, and stability. In 2003, Raemaekers et al. (2003) first found it as a novel 55-kD vertebrate protein and showed selective expression in proliferative cells. There is much evidence to indicate that NUSAP1 is closely associated with apoptosis, proliferation, and metastasis. For instance, in cervical cancer , NUSAP1 is shown to bound to the SUMO-E3 ligase Ran binding protein 2 (RanBP2) to induce the sumoylation of TCF4, thereby enhancing the Wnt/β signaling pathway and inducing tumor metastasis. Similarly, a high NUSAP1 expression level facilitated the development of bladder cancer by regulating epithelial-mesenchymal transition of bladder cancer via the TGFβ signaling pathway (Gao S. et al., 2020). In addition, Liu    reports that the mRNA and protein levels of NUSAP1 were more highly expressed in colon cancer than in paired non-cancerous adjacent tissues. These results also support the studies of Gordon et al. (2017). We herein also provide substantial evidence in support of the diagnostic and prognostic values of CCNA2, CKAP2L, NCAPG, and NUSAP1, which range from internal to external validation data sets. The genes CCNA2, CKAP2L, NCAPG, and NUSAP1 were not only significantly up-regulated in PCa tissues, but also positively associated with higher Gleason score and tumor stage, implicating significant contributions to the pathogenesis of PCa. In addition, survival analysis shows that high expression of hub genes is related to a shorter disease-free survival in PCa. Furthermore, ROC curve analysis of the four hub genes was conducted, and the protein expression levels were analyzed based on the HPA database, which provides evidence that these four genes have a high diagnostic and prognostic value in PCa.
It is well known that tumors are composed of not only tumor cells, but also of stroma and immune cells. Hence, several immune cells (B cells, CD4 + T cells, CD8 + T cells, macrophages, neutrophils, and dendritic cells) were assessed in TIMER. These hub genes show positive correlation with the majority of the above immune cells. Unexpectedly, the tumor purity of hub genes in most of the samples was more than 0.5. Based on these findings, we suspect that CCNA2, CKAP2L, NCAPG, and NUSAP1 might be mainly expressed in PCa cells.   GSEA and GSVA were further utilized to explore the biological function. Cell cycle and cell cycle-related pathways, such as DNA replication, mismatch repair, and recombination, were mainly observed as enriched pathways, indicating that they can lead to tumor proliferation. In addition, the CMap database was utilized to identify some small molecule drugs with potential therapeutic benefits against PCa. Most of these have been previously documented to have anticancer effects in various cancer types. Among the molecule drugs, vorinostat (Schneider et al., 2012), alvespimycin (Pacey et al., 2011), and tretinoin (Ueda et al., 2020) have undergone phase I clinical trials in cancer patients. In the early 1990s, some experimental evidence revealed that tanespimycin had antitumor activity in various human-derived tumor cell lines (Erlichman, 2009). Trichostatin A, which is a histone deacetylase inhibitor, had an underlying therapeutic effect in diverse cancer cells when combined with radiotherapy or chemotherapy. Trifluoperazine was originally an antipsychotic drug, whereas some recent literature report that it could inhibit cancer cell proliferation, such as hepatocellular carcinoma (Jiang et al., 2017), lung cancer (Yeh et al., 2012), and glioblastoma (Pinheiro et al., 2017). Moreover, the anticancer effects of other small molecule drugs are also mentioned by the researchers. Molecular docking analysis demonstrates that 15delta prostaglandin J2 had the highest binding affinity to four hub genes in 11 small molecule drugs. Prostaglandin J2, a potent activator of PPAR-γ (Kliewer et al., 1995), is shown to inhibit serum-stimulated cell proliferation in vascular smooth muscle cells. Thus, we speculated that 15delta prostaglandin J2 possessed the antitumor effects via inhibited serum-stimulated cell proliferation. In brief, this information remains to be beneficial for the development of targeted therapy in PCa.
However, there are some deficiencies that should be acknowledged. First, this was a retrospective study rather than a prospective cohort study, and all data in this study were acquired from the open available databases. In addition, further research, including in vivo and in vitro experiments are needed to elucidate the potential molecular mechanisms of how the four genes impact on Gleason score and tumor stage.
Taken together, CCNA2, CKAP2L, NCAPG, and NUSAP1 were successfully identified as our candidate genes and small molecular drugs with the potential to treat PCa. Of clinical significance, the four genes might serve as potential biomarkers for PCa, and these molecular drugs might provide a new avenue for the treatment of PCa.

AUTHOR CONTRIBUTIONS
All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

FUNDING
This study was supported by a grant from the Natural Science Foundation of Beijing (nos. 7172068 and 7192053).