ORIGINAL RESEARCH article

Front. Cell Dev. Biol., 23 February 2021

Sec. Epigenomics and Epigenetics

Volume 9 - 2021 | https://doi.org/10.3389/fcell.2021.639615

Establishment of Novel DNA Methylation-Based Prostate Cancer Subtypes and a Risk-Predicting Eight-Gene Signature

  • 1. Department of Urology, Shengjing Hospital of China Medical University, Shenyang, China

  • 2. Department of Breast and Endocrine Surgery, Tohoku University Hospital, Sendai, Japan

  • 3. PANASIAUSMLE, Association of Asian Medical Graduates, Toronto, ON, Canada

  • 4. Department of Spine and Joint Surgery, Shengjing Hospital of China Medical University, Shenyang, China

Abstract

Prostate cancer (PCa) is the most common malignant tumor affecting males worldwide. The substantial heterogeneity in PCa presents a major challenge with respect to molecular analyses, patient stratification, and treatment. Least absolute shrinkage and selection operator was used to select eight risk-CpG sites. Using an unsupervised clustering analysis, called consensus clustering, we found that patients with PCa could be divided into two subtypes (Methylation_H and Methylation_L) based on the DNA methylation status at these CpG sites. Differences in the epigenome, genome, transcriptome, disease status, immune cell composition, and function between the identified subtypes were explored using The Cancer Genome Atlas database. This analysis clearly revealed the risk characteristics of the Methylation_H subtype. Using a weighted correlation network analysis to select risk-related genes and least absolute shrinkage and selection operator, we constructed a prediction signature for prognosis based on the subtype classification. We further validated its effectiveness using four public datasets. The two novel PCa subtypes and risk predictive signature developed in this study may be effective indicators of prognosis.

Introduction

As the most common cancer in males, prostate cancer (PCa) is a major public health threat (Siegel et al., 2020). Based on the latest global cancer data from the World Health Organization (WHO)1, the age-standardized rate for PCa ranks second, 5-year prevalence ranks first, and age-standardized mortality ranks sixth. Despite the ongoing development of therapeutic strategies, the heterogeneity of PCa contributes to treatment failure (Chang et al., 2014; Peng et al., 2018). Therefore, it is necessary to identify subtypes of PCa during diagnosis and treatment.

During the process of DNA methylation, methyl groups are added to CpG islands on the DNA molecule. Hypermethylation acts on promoters and could lead to gene silencing, whereas hypomethylation is associated with chromosomal instability and a loss of imprinting (Daura-Oller et al., 2009). In many diseases, including cancer, abnormal hypermethylation on gene promoters, which could be inherited by daughter cells, has been detected (Wang and Lei, 2018). Abnormal DNA methylation status is now considered a significant determinant of cancer development. Abundant stable DNA methylation in the genome is a candidate for diagnosis and treatment (Mikeska and Craig, 2014). Accordingly, the use of DNA methylation status to divide PCa into subtypes may provide important new insights.

Novel methylation-based subtypes have been reported in PCa. For example, the Cancer Genome Atlas Research Network (Abeshouse et al., 2015) conducted a comprehensive molecular analysis of 333 primary prostate carcinomas and identified different subtypes with highly diverse genomic, epigenomic, and transcriptomic patterns. In particular, an unsupervised hierarchical cluster analysis of the 5,000 most variable hypermethylated CpG sites revealed four epigenetically distinct subtypes of PCa. In another multicenter study, a new epigenetic CpG methylator phenotype in advanced PCa was reported, and this subtype is characterized by hypermethylation both within and outside CpG sites, shores, and shelves (Zhao et al., 2020). These important studies have improved our understanding of the DNA methylation landscape in PCa, providing a reference for future research. Recent studies have focused on revealing associations of novel DNA methylation subtypes with driver events in the genome and transcriptome. Utilizing a different approach, in this study, we identified DNA methylation subtypes based on clinical outcomes, with further analyses linking these subtypes with molecular mechanisms. In particular, we screened out eight CpG sites in which the DNA methylation status was associated with prognosis in PCa and identified two subtypes based on these DNA methylation statuses. Thereafter, differences in the epigenome, genome, transcriptome, disease status, immune cell infiltration, and function between these two subtypes were explored. Lastly, genes related to the high-risk subtype were selected and screened to construct an eight-gene signature with the ability to predict prognosis. The effectiveness of the signature was validated using four public datasets.

Materials and Methods

Data Processing

RNA-seq data (in the form of HTSeq-Counts and HTSeq-FPKM), DNA methylation (450 K), somatic variation, copy number alterations (CNA), and clinical information for patients with PCa were downloaded from The Cancer Genome Atlas (TCGA) database2 (Blum et al., 2018). The gene annotation file was downloaded from the Ensembl database3 (Howe et al., 2020). RNA-seq data in FPKM were converted to TPM. The TPM and β-values for CpG sites were quantile-normalized. In total, 477 patients with both the data described above (RNA-seq and methylation data) and complete clinical information were included. Disease-related clinical information for these patients is provided in Table 1. To validate the effectiveness of the risk signature, gene expression profiles and clinical data from four public datasets were used, i.e., GSE70769 and GSE116918 from the Gene Expression Omnibus (GEO) database4 (Barrett et al., 2013; Ross-Adams et al., 2015; Jain et al., 2018) and DKF2018 and MSKCC2010 from cBioPortal for Cancer Genomics5 (Cerami et al., 2012; Gao et al., 2013). Information about these four datasets is provided in Table 2.

TABLE 1

CharacteristicsValue
Patients (n)477
Age (year), median (IQR)62.0 (56.0–66.0)
PSA (ng/ml), median (IQR)7.5 (5.1–11.4)
Pathological Gleason score, n (%)
≤643 (9.0%)
7 (3+4)143 (30.0%)
7 (4+3)100 (21.0%)
856 (11.7%)
9∼10135 (28.3%)
Prior malignancy, n (%)
No450 (94.3%)
Yes27 (5.7%)
Race, n (%)
Asian12 (2.5%)
Whit, American Indian or Alaska native398 (83.4%)
Black or African American55 (11.6%)
NA12 (2.5%)
Residual tumor, n (%)
R0301 (63.1%)
R115 (3.1%)
R2142 (29.8%)
Rx5 (1.0%)
NA14 (3.0%)
Clinical M, n (%)
M0437 (91.6%)
M1a or M1c2 (0.4%)
NA38 (8.0%)
Pathological T, n (%)
T1c2 (0.4%)
T2a13 (2.7%)
T2b10 (2.1%)
T2c160 (33.5%)
T3a151 (31.7%)
T3b129 (27.0%)
T49 (1.9%)
NA3 (0.7%)
Pathological N, n (%)
N0329 (69.0%)
N178 (16.4%)
NA70 (14.6%)
Diagnostic CT or MRI, n (%)
No evidence of extraprostatic extension196 (41.1%)
Equivocal6 (1.3%)
Extraprostatic extension localized22 (4.6%)
Extraprostatic extension9 (1.9%)
NA244 (51.1%)
Outcome, n (%)
DFS53 (11.1%)
Disease free424 (88.9%)

The disease-related clinical information of patients with PCa included in the study.

PCa, prostate cancer; PSA, prostate-specific antigen; DFS, disease-free survival; IQR, interquartile range; NA, not analyzed.

TABLE 2

DatasetSample sizeTranscriptome platformTissue
DKFZ201882Illumina HiSeq 2000 (RNAseq)Fresh frozen
GSE7076990Illumina humanHT-12 V4.0Fresh frozen
GSE1166918248ADXPCv1a520642 Affymetrix humanFormalin-fixed Paraffin-embedded
MSKCC2010140Affymetrix human Exon 1.0 ST arrayFresh frozen

Information of the four publicly available independent validation datasets.

Identification of DNA Methylation-Based Subtypes

We previously identified 120 CpG sites that were differentially methylated between PCa and normal prostate tissues and were significantly associated with disease-free survival (DFS) (Zhang et al., 2020b). Least absolute shrinkage and selection operator (LASSO) regression enables variable selection and regularization, while fitting the generalized linear model. Therefore, LASSO regression was used to reduce the number of CpG sites as the input for subtype identification using the glmnet R package (Engebretsen and Bohlin, 2019). After the CpG sites were screened, the β values for these sites in 477 patients were used as inputs for consensus clustering, an unsupervised clustering analysis. Consensus clustering was performed using the ConsensusClusterPlus R package (Monti et al., 2003) and the following operating parameters: maxK = 10, reps = 1000, pItem = 0.8, pFeature = 1, clusterAlg = hc, and distance = pearson. A heatmap was generated to visualize the methylation status of the subtypes using the pheatmap R package (Kolde and Kolde, 2015). Furthermore, a survival analysis of the subtypes was performed by the log rank test using the survival R package (Therneau and Lumley, 2014). To further demonstrate the differences between subtypes, a principal component analysis was performed using the 477 patients. Finally, correlations between disease-related clinical information and subtype status were evaluated by the Mann–Whitney U test, χ2 test, or Fisher’s exact test.

Single Nucleotide Variation Between Subtypes

Simple nucleotide variants were compared between subtypes using the GenVisR R package (Skidmore et al., 2016). Genes with the top 10 mutation frequencies were displayed in a waterfall plot. According to the results of the waterfall plot, the difference between the mRNA levels of mutant and wild-type Speckle-type POZ Protein (SPOP) was evaluated. Then, the mRNA levels of SPOP in different subtypes were compared by Wilcoxon’s test.

Differences in Copy Number Alterations, TMPRSS2–ERG Fusion, and Androgen Receptor Scores Between Subtypes

To explore the difference in CNAs between subtypes, genes with significant differences in copy number between subtypes were identified by chi-squared tests. Among these genes, RND3 was differentially expressed between the subtypes, as determined by a Wilcoxon test. The type and frequency of CNAs in RND3 were explored. Furthermore, the relationship between CNA types and mRNA expression levels of RND3 were evaluated by the Wilcoxon test.

Data from The Tumor Fusion Gene Data Portal database (https://www.tumorfusions.org/) were used to analyze the difference in TMPRSS2–ERG fusion gene expression between the subtypes (Yoshihara et al., 2015). Finally, the androgen receptor (AR) score in each subtype was compared by the Wilcoxon test. AR scores were obtained from the cBioPortal database (Cancer Genome Atlas Research Network, 2015).

Immune Cells in the Tumor Microenvironment in Each Subtype

RNA-seq data in TPM format were uploaded to CIBERSORTx6 (Newman et al., 2019) to evaluate the infiltration of 22 types of immune cells in the tumor microenvironment. The abundances of these immune cells were compared between subtypes using a violin plot and the Wilcoxon test. Furthermore, survival curves were generated for these cells. The p-values for the survival analysis were calculated by a Cox regression and log-rank test using the survival R package (Therneau and Lumley, 2014).

Functional Enrichment Analysis of Subtypes

Fold change values for gene expression differences between the subtypes were used as the ranks in a gene set enrichment analysis (GSEA). To obtain fold changes, HTSeq-Counts were analyzed using the DESeq2 R package (Love et al., 2014). The hallmark gene set downloaded from the Molecular Signatures Database9 v7.17 was used as the reference gene list in the GSEA (Subramanian et al., 2005; Liberzon et al., 2011). Finally, the GSEA was completed using the clusterProfiler R package (Yu et al., 2012).

Furthermore, expression levels of genes that were crucial for PCa were compared between the subtypes by Wilcoxon tests.

Weighted Correlation Network Analysis of Subtypes

A weighted correlation network analysis (WGCNA) could be used find phenotype-associated gene modules (Langfelder and Horvath, 2008; Li et al., 2019). Therefore, TPM values from RNA-seq data were used as the input for a WGCNA. Eight was the soft power threshold to construct a network that simultaneously satisfied a scale-free topology and high connectivity. Pearson correlation coefficients for the relationships between phenotypes and gene modules were determined. The phenotypes included PSA (prostate-specific antigen), Gleason score, and the subtypes. The gene module most closely associated with the high-risk phenotypes was identified. Differentially expressed genes (DEGs) between PCa and normal prostate tissues in the selected gene module were identified. The conditions for DEGs were logarithmic fold changes (| LFCs|) > 1 and p < 0.05. Then, survival-associated genes were screened from the DEGs by Cox regression and log-rank tests. Finally, genes for LASSO were filtered out.

Identification of a Risk Signature in the Training Set

Before training, 477 patients were randomly divided into a training set and internal validation set using the caret R package (Kuhn, 2008). Information for patients in the training set is provided in Supplementary Table 1 and information for the internal validation set is provided in Supplementary Table 2. LASSO regression was used to construct a single signature for predicting prognosis with high performance (Svane et al., 2018). LASSO regression was applied to the training set; during the selection of genes, the C-index after 10-fold cross-validation reflected the effect of different screening strategies. Genes with the maximal C-index values were selected for the prognostic signature using the glmnet R package (Engebretsen and Bohlin, 2019) with the following parameter settings: family = Cox, type.measure = C, parallel = TRUE, with default settings for other parameters. Furthermore, the difference in the risk score between subtypes identified and the relationship between the risk score and survival were evaluated. Comparisons were performed using the Wilcoxon test.

Predictive Accuracy of the Signature

First, time-dependent receiver operating characteristic (tdROC) curves were used to evaluate the predictive accuracy of the signature in the training set, internal validation set, and external validation sets (DKFZ2018, GSE70769, GSE116918, and MSKCC2010) using the timeROC R package (Blanche and Blanche, 2013). Then, a survival analysis by Cox regression and the log-rank test was performed using these datasets. Survival curves were plotted using the Kaplan–Meier method and the survminer R package (Kassambara et al., 2017).

Univariate and multivariate Cox regression analyses were used to explore whether the risk score is an independent predictor of prognosis. Finally, the clinical diagnostic value of the signature was compared with that of clinical features (Gleason score and PSA) by a decision curve analysis (DCA) (Van Calster et al., 2018). DCA is used to compare prediction models that incorporate clinical outcomes; it requires only the dataset on which the models are tested and can be applied to models with either continuous or dichotomous results (Zhang et al., 2020a).

Statistical Analysis

R 3.6.3 was used for all statistical analyses. Values of p < 0.05 were defined as statistically significant. In the survival analysis, the survival outcome was defined as DFS or biochemical recurrence-free survival (BCR) based on clinical records.

Results

Identification of Two DNA Methylation-Based Subtypes

The cumulative distribution function (CDF) and relative change in the area under the CDF curve are shown in Figures 1A,B, respectively. According to Monti et al. (2003), the optimal k value is determined by a number of factors. One of the criteria is that when the optimal k value is reached, the area under the CDF curve will not increase significantly with increases in k. Therefore we first assumed that the optimal value in this study was set to k = 5, indicating that the cohort could be divided into up to five subtypes. However, one cluster consisted of only a single patient when k = 4 or 5. Additionally, the cluster-consensus value for each cluster was not large enough under k = 4 or 5 (Supplementary Figure 1). Therefore, we focused on k = 2 or 3. For k = 3, patients in C1 showed a worse prognosis, and patients in C2 and C3 did not show an obvious difference in prognosis (Figure 1C). For k = 2, patients in Methylation_H had worse a prognosis than that of patients in Methylation_L (Figure 1D). Furthermore, the Methylation_H subtype and C1 subtype included the same patients and the Methylation_L subtype consisted of patients in the C2 and C3 subtypes. Groups for different k values are shown in Supplementary Table 3. Ultimately, we identified two subtypes with a difference in prognosis. The consensus matrix is displayed in the form of a heatmap in Figure 1E and the consensus clustering analysis is summarized in Supplementary Figure 1. A principal component analysis revealed clear separation between the Methylation_H subtype and the Methylation_L subtype (Figure 1F). As shown in Figure 1G, these two subtypes had different levels of DNA methylation at CpG sites. We defined the hypermethylated subtype as Methylation_H and the hypo methylated subtype as Methylation_L. As shown in Table 3, age, PSA, Gleason score, residual tumor status, pathological T, and clinical outcome of patients between Methylation_H and Methylation_L subtypes are different significantly.

FIGURE 1

TABLE 3

Clinicopathologic variablesConsensus clusters
P
Methylation_H
(n = 220)
Methylation_L
(n = 257)
Age (year), median (IQR)63.0 (57.8–67.0)60.0 (55.0–65.0)0.001a
PSA (ng/ml), median (IQR)8.0 (5.8–12.9)6.8 (4.6–10.1)< 0.001a
Pathological Gleason score, n (%)< 0.001b
≤611 (5.0%)32 (12.5%)
7 (3+4)53 (24.1%)90 (35.0%)
7 (4+3)45 (20.5%)55 (21.4%)
826 (11.8%)30 (11.7%)
9∼1085 (38.6%)50 (19.4%)
Prior malignancy, n (%)0.828b
No207 (94.1%)243 (94.6%)
Yes13 (5.9%)14 (5.4%)
Race, n (%)0.529b
Asian7 (3.2%)5 (2.0%)
Whit, American Indian or Alaska native181 (82.3%)217 (84.4%)
Black or African American28 (12.7%)27 (10.5%)
NA4 (1.8%)8 (3.1%)
Residual tumor, n (%)
R0126 (57.3%)175 (68.1%)0.005b
Rx/R1/R290 (40.9%)72 (28.0%)
NA4 (3.0%)10 (3.9%)
Clinical M, n (%)0.226c
M0207 (94.1%)230 (89.5%)
M1a or M1c2 (0.9%)0 (0.0%)
NA11 (5.0%)27 (10.5%)
Pathological T, n (%)0.013c
T1c0 (0.0%)2 (0.8%)
T2a3 (1.4%)10 (3.9%)
T2b3 (1.4%)7 (2.7%)
T2c63 (28.6%)97 (37.7%)
T3a71 (32.3%)80 (31.1%)
T3b73 (33.2%)56 (21.8%)
T46 (2.7%)3 (1.2%)
NA1 (0.4%)2 (0.8%)
Pathological N, n (%)0.289b
N0151 (68.6%)178 (69.3%)
N141 (18.6%)37 (14.4%)
NA28 (12.8%)42 (16.3%)
Diagnostic CT or MRI, n (%)
No evidence of extraprostatic extension95 (43.2%)101 (39.3%)0.135c
Equivocal5 (2.3%)1 (0.4%)
Extraprostatic extension localized10 (4.5%)12 (4.7%)
Extraprostatic extension7 (3.2%)2 (0.8%)
NA103 (46.8%)141 (54.8%)
Outcome, n (%)
DFS35 (8.3%)18 (7.0%)0.002b
Disease free185 (91.7%)239 (93.0%)

The association between subtypes and disease-related clinical information of PCa.

P values were calculated by the Mann–Whitney U test (a), Chi-square test (b), or Fisher’s exact test (c). PCa, prostate cancer; PSA, prostate-specific antigen; DFS, disease-free survival; IQR, interquartile range; NA, not analyzed.

Single Nucleotide Variations in Methylation_H and Methylation_L

Single nucleotide variations in genes with the top 10 mutation frequencies in these two subtypes are shown in Figures 2A,B. We found that the frequency of single nucleotide variations in SPOP was higher in the Methylation_H subtype than in the Methylation_L subtype. SPOP is one of the most frequently mutated genes in primary PCa. Based on the tumor-suppressive role of SPOP in PCa and the results of loss-of-function assays, SPOP mutations are expected to include the invasion and proliferation of PCa cells (Barbieri et al., 2012; An et al., 2014). Furthermore, within the Methylation_H subtype, mRNA expression levels of SPOP in patients with mutations were significantly lower than those in patients with wild-type SPOP (Figure 2C). However, this pattern was not observed in the Methylation_L subtype (Figure 2D). Finally, we found that the mRNA expression level of SPOP in the Methylation_H subtype was significantly lower than that in the Methylation_L subtype (Figure 2E). These results supported the risk characteristics of Methylation_H.

FIGURE 2

Copy Number Alterations of RND3, TMPRSS2–ERG Fusion, and Androgen Receptor Scores Between the Methylation_H and Methylation_L Subtypes

We identified a group of genes with a significant difference in copy number between subtypes. Among these genes, RND3, also called RhoE, is a tumor suppressor that is downregulated early in the development of PCa (Bektic et al., 2005). Interestingly, the expression of RND3 was significantly lower in the Methylation_H subtype than that in the Methylation_L subtype (Figure 3A). For RND3, the type of CNA and the frequency of CNAs are summarized in Figure 3B. In 477 patients, a single-copy deletion was the only CNA detected in RND3. This deletion was more frequent in the Methylation_H subtype. Additionally, RND3 was significantly down-regulated in cases with the single-copy deletion (Figure 3C).

FIGURE 3

The frequency of the TMPRSS2–ERG fusion was significantly lower in the Methylation_H subtype than in the Methylation_L subtype (Figure 3D). Some studies have revealed that the TMPRSS2–ERG fusion is related to the invasiveness of PCa and a higher Gleason score (Perner et al., 2006; Mehra et al., 2007; Rajput et al., 2007; Cheville et al., 2008). However, other studies have reported that the TMPRSS2–ERG fusion is not related to prognosis in PCa (Yoshimoto et al., 2006; Tu et al., 2007; Darnel et al., 2009). Furthermore, AR scores for patients in the Methylation_H subtype were higher than those of patients in the Methylation_L subtype (Figure 3E).

Immune Cells in the Tumor Microenvironment in Each Subtype

In Figure 4A, the difference in the immune cell composition in the tumor microenvironment of each subtype is displayed in the form of a violin plot. Plasma cells and resting mast cells were significantly less abundant in the Methylation_H subtype and regulatory T cells (Tregs), M1 macrophages, and M2 macrophages were significantly more abundant in the Methylation_H subtype than in the Methylation_L subtype. Among these immune cells, greater M1 and M2 macrophage infiltration in the tumor microenvironment was related to a worse prognosis in PCa (Figures 4B,C).

FIGURE 4

Functional Enrichment Analysis of Each Subtype

Based on a GSEA, we ranked enriched terms in descending order of normalized enrichment scores. The top ten enriched terms are displayed in the Figure 5A. Among these terms, HALLMARK_E2F_TARGETS, HALLMARK_G2M_ CHECKPOINT, HALLMARK_MYC_TARGETS_V1, HALLMARK_MYC_TARGETS_V2, and HALLMARK_ MTORC1_SIGNALING were enriched in the Methylation_H subtype (Figures 5B–F).

FIGURE 5

Furthermore, the mRNA expression levels of AURKA, DLGAP5, FOXD1, KIF4A, MELK, MYBL2, SPAG5, and TPX2 were significantly higher in the Methylation_H subtype than in the Methylation_L subtype (Figure 6). These genes have all been reported to facilitate the development and progression of PCa (Kuner et al., 2013; Zhang et al., 2016, 2020b; Akamatsu et al., 2018; Hewit et al., 2018; Zou et al., 2018; Cao et al., 2020; Li et al., 2020).

FIGURE 6

WGCNA for the Identification of a Key Gene Module

Setting eight as the soft threshold, the independence of the scale-free topology and the mean connectivity in each module were sufficient (Figures 7A,B). After the dynamic cut and merge process, 15 gene modules were generated (Figure 7C). Among these, the MEgreen module was positively correlated with the PSA level, Gleason score, and the Methylation_H subtype (Figure 7D) and was negatively correlated with the Methylation_L subtype. Therefore, genes in this module were related to the development and progression of PCa. According to the flow diagram in Figure 7E, we then screened out the DEGs between PCa and normal prostate tissues in the MEgreen module. Survival-associated DEGs were further identified for LASSO regression.

FIGURE 7

Construction of the Gene-Based Risk Signature

As shown in Figure 8A, when eight genes were included in the signature, the C-index value was maximized. Accordingly, the eight-gene signature had the best predictive value during the training process. Figure 8B presents the coefficients for each gene during the training process. Finally, an eight-gene signature for predicting the risk score was constructed as follows:

FIGURE 8

Patients in the training and internal validation set were arranged in ascending order based on risk scores (Figures 8C,F). Setting the median risk score as the threshold, the frequencies of cancer-specific death or biochemical recurrence were higher in high-risk patients than in low-risk patients in the training and internal validation sets (Figures 8D,G). The expression modes of eight genes in the signature are displayed in Figures 8E,H. Furthermore, patients in the Methylation_H subtype had significantly higher risk scores than patients in the Methylation_L subtype (Figure 8I). Finally, we found that patients with cancer-specific death or biochemical recurrence had higher risk scores in the training set, internal validation set, and external validation sets (DKFZ2018, GSE70769, GSE116918, and MSKCC2010) (Figures 8G,K and Supplementary Figure 2).

Validation of the Signature

The areas under the curve of the tdROC (reflecting the effectiveness of a classifier) for the training set, internal validation set, and external validation sets (DKFZ2018, GSE70769, GSE116918, and MSKCC2010) were 0.72, 0.66, 0.76, 0.76, 0.84, and 0.74, respectively (Figures 9A–F). Furthermore, a survival analysis revealed that a higher risk score was associated with a worse prognosis. Similar results were observed for all data sets evaluated (Figures 9G–L).

FIGURE 9

In univariate Cox regression analyses, six variables (subtype, pathological N, pathological T, Gleason score, PSA, and risk score) were associated with a worse prognosis (Figure 10A). In a multivariate Cox regression, Gleason score and risk score were identified as independent indicators for prognosis (Figure 10B). Within a wider range of threshold probabilities, the clinical net benefit was greater for the risk score than for the PSA level or Gleason score (Figure 10C). Within this range of threshold probabilities, the signature provides a more accurate prediction, thereby reducing the number of patients with a worse prognosis (Figure 10D).

FIGURE 10

Discussion

We have recently described the advantages and necessity of multi-omics approaches for studies of PCa (Zhang et al., 2020c). In this study, we identified two subtypes with different DNA methylation statuses and found that the Methylation_H subtype was related to a worse prognosis. The subtypes were then comprehensively compared with respect to the epigenome, genome, transcriptome, disease status, immune cell infiltration, and function.

The mRNA levels of SPOP, which is the most frequently mutated tumor-suppressor gene in primary PCa, were lower in the Methylation_H subtype than in the Methylation_L subtype (Barbieri et al., 2012; An et al., 2014). Additionally, the mutation frequency in SPOP was higher in the Methylation_H subtype, and SPOP expression was lower in mutants. Another tumor suppressor that is downregulated early in the development of PCa, RND3, was expressed at significantly lower levels in the Methylation_H subtype than in the Methylation_H subtype (Bektic et al., 2005). The single-copy deletion of RND3 was more frequent in the Methylation_H subtype and this deletion corresponded with the downregulation of RND3. AR scores, which reflect disease progression, were also significantly higher in the Methylation_H subtype than in the Methylation_L subtype. M1 and M2 macrophages showed a greater degree of infiltration in Methylation_H. The polarization of macrophages into the M1 and M2 phenotypes plays a pivotal role in ovarian cancer initiation, progression, and metastasis and provides targets for macrophage-centered treatment in the cancer microenvironment (Cheng et al., 2019). Consistent with these previous findings, we found that increased levels of M1 and M2 macrophages in the tumor microenvironment were related to a worse prognosis in PCa.

In the Methylation_H subtype, E2F, MYC, mTORC1, and G2M checkpoint were activated. E2F, MYC, and mTORC1 have been shown to promote the development of PCa (Huang et al., 2017; Zhang et al., 2017; Labbé et al., 2019). Furthermore, G2M checkpoint activation is related to a reduced cancer sensitivity to chemotherapy or radiation (Morgan, 2007). Furthermore, the mRNA expression levels of AURKA, DLGAP5, FOXD1, KIF4A, MELK, MYBL2, SPAG5, and TPX2 were significantly higher in the Methylation_H subtype than in the Methylation_L subtype. The gain of the AURKA oncogene is an important genomic change related to treatment-related neuroendocrine PCa (Akamatsu et al., 2018). Androgen-dependent PCa cells need DLGAP5 to stabilize mitotic health and function, and the knockdown of DLGAP5 improves the efficacy of docetaxel (Hewit et al., 2018). The knockdown of FOXD1 and MYBL2 would inhibit the growth of androgen-independent PCa cells (Li et al., 2020; Zhang et al., 2020b). KIF4A plays an significant role in the progression of castration-resistant PCa and serves as a key determinant of resistance to endocrine therapy (Cao et al., 2020). MELK is associated with the cell survival rate and BCR in PCa (Jurmeister et al., 2018). SPAG5 expression is significantly associated with the clinical stage, lymph node metastasis, Gleason score, and BCR (Zhang et al., 2016). The knockdown of TPX2 increases chromosome mis-segregation and suppresses tumor cell growth in PCa (Pan et al., 2017). These genes driving the progression of PCa were all expressed more highly in the Methylation_H subtype than in the Methylation_L subtype, further supporting the high-risk characteristics of the Methylation_H subtype.

The conserved differences uncovered the high-risk characteristics of the Methylation_H subtype. We further employed WGCNA, a common method in systems biology, to identify a key gene module; this module was related to the Gleason score, PSA, and Methylation_H subtype. Survival-associated DEGs from this gene module were used to construct an eight-gene signature for predicting risk. The effectiveness of the signature was validated in TCGA and another four public datasets (DKFZ2018, GSE70769, GSE116918, and MSKCC2010). With respect to the clinical applications of these findings, we have the following suggestions. Because RNA-seq data in TPM format were used to train the signature, we suggest employing the same data format of data in clinical applications. Considering batch effects of measurement techniques, gene expression levels should be measured by similar techniques, even though the signature performed well in the validation data sets, in which genes were profiled by array-based methods. Furthermore, the risk levels were determined by the median risk score in the patient cohorts. In the future, the study cohort should be further expanded to obtain a more objective and stable threshold range.

Collectively, we identified two subtypes with different methylation statuses at eight CpG sites and evaluated the high-risk characteristics of the Methylation_H subtype based on epigenomics, genomics, transcriptomics, disease status, immune cell infiltration, and functional analyses. Finally, based on these two novel subtypes, an eight-gene predictive signature was constructed and validated using various public datasets.

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author contributions

EZ, MZ, and YS were responsible for the design and conception of the research project. EZ, YS, MZ, FS, OM, JH, YG, and HW contributed to the data acquisition or data analysis and data cleaning. EZ and YS participated in the drafting of the manuscript and the rigorous modification of the manuscript to clearly convey the research contents. All authors are responsible for the authenticity and reliability of this study and have no objection to the final submitted manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (No. 81802540), the Natural Science Foundation of Liaoning Province (No. 20180550985), the Shenyang Science and Technology Program for Young Innovative Talents (No. RC190386), and the 345 Talent Project of Shengjing Hospital of China Medical University. The 345 Talent Project includes the 30 project and the 50 project.

Acknowledgments

The results shown here are in whole based upon data generated by the TCGA Research Network (https://www.cancer.gov/tcga), GEO (https://www.ncbi.nlm.nih.gov/geo/), cBioPortal (https://www.cbioportal.org/), and The Tumor Fusion Gene Data Portal (https://www.tumorfusions.org/). Furthermore, we want to thank profs. YS and MZ for their guidance and financial support for this study. At the same time, we thank the Medical Research Center of Shengjing Hospital for providing us with laboratories.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fcell.2021.639615/full#supplementary-material

References

  • 1

    AbeshouseA.AhnJ.AkbaniR.AllyA.AminS.Andry ChristopherD.et al (2015). The molecular taxonomy of primary prostate cancer.Cell1631011–1025.

  • 2

    AkamatsuS.InoueT.OgawaO.GleaveM. E. (2018). Clinical and molecular features of treatment-related neuroendocrine prostate cancer.Int. J. Urol.25345–351. 10.1111/iju.13526

  • 3

    AnJ.WangC.DengY.YuL.HuangH. (2014). Destruction of full-length androgen receptor by wild-type SPOP, but not prostate-cancer-associated mutants.Cell Rep.6657–669. 10.1016/j.celrep.2014.01.013

  • 4

    BarbieriC. E.BacaS. C.LawrenceM. S.DemichelisF.BlattnerM.TheurillatJ.-P.et al (2012). Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer.Nat. Genet.44685–689. 10.1038/ng.2279

  • 5

    BarrettT.WilhiteS. E.LedouxP.EvangelistaC.KimI. F.TomashevskyM.et al (2013). NCBI GEO: archive for functional genomics data sets–update.Nucleic Acids Res.41D991–D995. 10.1093/nar/gks1193

  • 6

    BekticJ.PfeilK.BergerA. P.RamonerR.PelzerA.SchäferG.et al (2005). Small G-protein RhoE is underexpressed in prostate cancer and induces cell cycle arrest and apoptosis.Prostate64332–340.

  • 7

    BlancheP.BlancheM. P. (2013). Package ‘TimeROC’: Time-Dependent ROC Curveand AUC for Censored Survival Data.Vienna: R Foundation for Statistical Computing.

  • 8

    BlumA.WangP.ZenklusenJ. C. (2018). SnapShot: TCGA-Analyzed Tumors.Cell173:530. 10.1016/j.cell.2018.03.059

  • 9

    Cancer Genome Atlas Research Network (2015). The molecular taxonomy of primary prostate cancer.Cell1631011–1025. 10.1016/j.cell.2015.10.025

  • 10

    CaoQ.SongZ.RuanH.WangC.YangX.BaoL.et al (2020). Targeting the KIF4A/AR Axis to reverse endocrine therapy resistance in castration-resistant prostate cancer.Clin. Cancer Res.261516–1528. 10.1158/1078-0432.CCR-19-0396

  • 11

    CeramiE.GaoJ.DogrusozU.GrossB. E.SumerS. O.AksoyB. A.et al (2012). The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.Cancer Discov.2401–404. 10.1158/2159-8290.CD-12-0095

  • 12

    ChangA. J.AutioK. A.RoachM.ScherH. I. (2014). High-risk prostate cancer-classification and therapy.Nat. Rev. Clin. Oncol.11308–323. 10.1038/nrclinonc.2014.68

  • 13

    ChengH.WangZ.FuL.XuT. (2019). Macrophage polarization in the development and progression of ovarian cancers: an overview.Front. Oncol.9:421. 10.3389/fonc.2019.00421

  • 14

    ChevilleJ. C.KarnesR. J.TherneauT. M.KosariF.MunzJ.-M.TillmansL.et al (2008). Gene panel model predictive of outcome in men at high-risk of systemic progression and death from prostate cancer after radical retropubic prostatectomy.J. Clin. Oncol.263930–3936. 10.1200/JCO.2007.15.6752

  • 15

    DarnelA. D.LafargueC. J.VollmerR. T.CorcosJ.BismarT. A. (2009). TMPRSS2-ERG fusion is frequently observed in Gleason pattern 3 prostate cancer in a Canadian cohort.Cancer Biol. Ther.8125–130.

  • 16

    Daura-OllerE.CabreM.MonteroM. A.PaternainJ. L.RomeuA. (2009). Specific gene hypomethylation and cancer: new insights into coding region feature trends.Bioinformation3340–343.

  • 17

    EngebretsenS.BohlinJ. (2019). Statistical predictions with glmnet.Clin. Epigenetics11:123. 10.1186/s13148-019-0730-1

  • 18

    GaoJ.AksoyB. A.DogrusozU.DresdnerG.GrossB.SumerS. O.et al (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.Sci. Signal.6:l1. 10.1126/scisignal.2004088

  • 19

    HewitK.SandilandsE.MartinezR. S.JamesD.LeungH. Y.BryantD. M.et al (2018). A functional genomics screen reveals a strong synergistic effect between docetaxel and the mitotic gene DLGAP5 that is mediated by the androgen receptor.Cell Death Dis.9:1069. 10.1038/s41419-018-1115-7

  • 20

    HoweK. L.AchuthanP.AllenJ.AllenJ.Alvarez-JarretaJ.AmodeM. R.et al (2020). Ensembl 2021.Nucleic Acids Res.49D884–D891. 10.1093/nar/gkaa942

  • 21

    HuangL.ChenK.CaiZ.-P.ChenF.-C.ShenH.-Y.ZhaoW.-H.et al (2017). DEPDC1 promotes cell proliferation and tumor growth via activation of E2F signaling in prostate cancer.Biochem. Biophys. Res. Commun.490707–712. 10.1016/j.bbrc.2017.06.105

  • 22

    JainS.LyonsC. A.WalkerS. M.McQuaidS.HynesS. O.MitchellD. M.et al (2018). Validation of a Metastatic Assay using biopsies to improve risk stratification in patients with prostate cancer treated with radical radiation therapy.Ann. Oncol.29215–222. 10.1093/annonc/mdx637

  • 23

    JurmeisterS.Ramos-MontoyaA.SandiC.Pértega-GomesN.WadhwaK.LambA. D.et al (2018). Identification of potential therapeutic targets in prostate cancer through a cross-species approach.EMBO Mol. Med.10:e8274. 10.15252/emmm.201708274

  • 24

    KassambaraA.KosinskiM.BiecekP.FabianS. (2017). Package ‘survminer’. Drawing Survival Curves using ‘ggplot2’(R package version 03 1).

  • 25

    KoldeR.KoldeM. R. (2015). Package ‘pheatmap’.R Package1790.

  • 26

    KuhnM. (2008). Building predictive models in R using the caret package.J. Stat. Softw.281–26.

  • 27

    KunerR.FälthM.PressinottiN. C.BraseJ. C.PuigS. B.MetzgerJ.et al (2013). The maternal embryonic leucine zipper kinase (MELK) is upregulated in high-grade prostate cancer.J. Mol. Med. (Berl.)91237–248. 10.1007/s00109-012-0949-1

  • 28

    LabbéD. P.ZadraG.YangM.ReyesJ. M.LinC. Y.CacciatoreS.et al (2019). High-fat diet fuels prostate cancer progression by rewiring the metabolome and amplifying the MYC program.Nat. Commun.10:4358. 10.1038/s41467-019-12298-z

  • 29

    LangfelderP.HorvathS. (2008). WGCNA: an R package for weighted correlation network analysis.BMC Bioinform.9:559. 10.1186/1471-2105-9-559

  • 30

    LiG.JiangY.LyuX.CaiY.ZhangM.WangZ.et al (2019). Deconvolution and network analysis of IDH-mutant lower grade glioma predict recurrence and indicate therapeutic targets.Epigenomics111323–1333. 10.2217/epi-2019-0137

  • 31

    LiX.JiaoM.HuJ.QiM.ZhangJ.ZhaoM.et al (2020). miR-30a inhibits androgen-independent growth of prostate cancer via targeting MYBL2, FOXD1, and SOX4.Prostate80674–686. 10.1002/pros.23979

  • 32

    LiberzonA.SubramanianA.PinchbackR.ThorvaldsdóttirH.TamayoP.MesirovJ. P. (2011). Molecular signatures database (MSigDB) 3.0.Bioinformatics271739–1740. 10.1093/bioinformatics/btr260

  • 33

    LoveM. I.HuberW.AndersS. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.Genome Biol.15:550.

  • 34

    MehraR.TomlinsS. A.ShenR.NadeemO.WangL.WeiJ. T.et al (2007). Comprehensive assessment of TMPRSS2 and ETS family gene aberrations in clinically localized prostate cancer.Mod. Pathol.20538–544.

  • 35

    MikeskaT.CraigJ. M. (2014). DNA methylation biomarkers: cancer and beyond.Genes (Basel)5821–864. 10.3390/genes5030821

  • 36

    MontiS.TamayoP.MesirovJ.GolubT. (2003). Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data.Mach. Learn.5291–118. 10.1023/A:1023949509487

  • 37

    MorganD. O. (2007). The Cell Cycle: Principles of Control.London: New science press.

  • 38

    NewmanA. M.SteenC. B.LiuC. L.GentlesA. J.ChaudhuriA. A.SchererF.et al (2019). Determining cell type abundance and expression from bulk tissues with digital cytometry.Nat. Biotechnol.37773–782. 10.1038/s41587-019-0114-2

  • 39

    PanH.-W.SuH.-H.HsuC.-W.HuangG.-J.WuT. T.-L. (2017). Targeted TPX2 increases chromosome missegregation and suppresses tumor cell growth in human prostate cancer.Onco Targets Ther.103531–3543. 10.2147/OTT.S136491

  • 40

    PengD.GeG.XuZ.MaQ.ShiY.ZhouY.et al (2018). Diagnostic and prognostic biomarkers of common urological cancers based on aberrant DNA methylation.Epigenomics101189–1199. 10.2217/epi-2018-0017

  • 41

    PernerS.DemichelisF.BeroukhimR.SchmidtF. H.MosqueraJ.-M.SetlurS.et al (2006). TMPRSS2:ERG fusion-associated deletions provide insight into the heterogeneity of prostate cancer.Cancer Res.668337–8341.

  • 42

    RajputA. B.MillerM. A.De LucaA.BoydN.LeungS.Hurtado-CollA.et al (2007). Frequency of the TMPRSS2:ERG gene fusion is increased in moderate to poorly differentiated prostate cancers.J. Clin. Pathol.601238–1243.

  • 43

    Ross-AdamsH.LambA. D.DunningM. J.HalimS.LindbergJ.MassieC. M.et al (2015). Integration of copy number and transcriptomics provides risk stratification in prostate cancer: a discovery and validation cohort study.EBioMedicine21133–1144. 10.1016/j.ebiom.2015.07.017

  • 44

    SiegelR. L.MillerK. D.JemalA. (2020). Cancer statistics, 2020.CA Cancer J. Clin.707–30. 10.3322/caac.21590

  • 45

    SkidmoreZ. L.WagnerA. H.LesurfR.CampbellK. M.KunisakiJ.GriffithO. L.et al (2016). GenVisR: genomic visualizations in R.Bioinformatics323012–3014. 10.1093/bioinformatics/btw325

  • 46

    SubramanianA.TamayoP.MoothaV. K.MukherjeeS.EbertB. L.GilletteM. A.et al (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc. Natl. Acad. Sci. U.S.A.10215545–15550.

  • 47

    SvaneA. M.SoerensenM.LundJ.TanQ.JylhäväJ.WangY.et al (2018). DNA methylation and all-cause mortality in middle-aged and elderly danish twins.Genes (Basel)9:78. 10.3390/genes9020078

  • 48

    TherneauT. M.LumleyT. (2014). Package ‘survival’.Survival Anal.2:3.

  • 49

    TuJ. J.RohanS.KaoJ.KitabayashiN.MathewS.ChenY.-T. (2007). Gene fusions between TMPRSS2 and ETS family genes in prostate cancer: frequency and transcript variant analysis by RT-PCR and FISH on paraffin-embedded tissues.Mod. Pathol.20921–928.

  • 50

    Van CalsterB.WynantsL.VerbeekJ. F. M.VerbakelJ. Y.ChristodoulouE.VickersA. J.et al (2018). Reporting and interpreting decision curve analysis: a guide for investigators.Eur. Urol.74796–804. 10.1016/j.eururo.2018.08.038

  • 51

    WangY.-P.LeiQ.-Y. (2018). Metabolic recoding of epigenetics in cancer.Cancer Commun. (Lond.)38:25. 10.1186/s40880-018-0302-3

  • 52

    YoshiharaK.WangQ.Torres-GarciaW.ZhengS.VegesnaR.KimH.et al (2015). The landscape and therapeutic relevance of cancer-associated transcript fusions.Oncogene344845–4854. 10.1038/onc.2014.406

  • 53

    YoshimotoM.JoshuaA. M.Chilton-MacneillS.BayaniJ.SelvarajahS.EvansA. J.et al (2006). Three-color FISH analysis of TMPRSS2/ERG fusions in prostate cancer indicates that genomic microdeletion of chromosome 21 is associated with rearrangement.Neoplasia8465–469.

  • 54

    YuG.WangL.-G.HanY.HeQ.-Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters.OMICS16284–287. 10.1089/omi.2011.0118

  • 55

    ZhangE.HeJ.ZhangH.ShanL.WuH.ZhangM.et al (2020a). Immune-related gene-based novel subtypes to establish a model predicting the risk of prostate cancer.Front. Genet.11:595657. 10.3389/fgene.2020.595657

  • 56

    ZhangE.HouX.HouB.ZhangM.SongY. (2020b). A risk prediction model of DNA methylation improves prognosis evaluation and indicates gene targets in prostate cancer.Epigenomics12333–352. 10.2217/epi-2019-0349

  • 57

    ZhangE.ZhangM.ShiC.SunL.ShanL.ZhangH.et al (2020c). An overview of advances in multi-omics analysis in prostate cancer.Life Sci.260:118376. 10.1016/j.lfs.2020.118376

  • 58

    ZhangH.LiS.YangX.QiaoB.ZhangZ.XuY. (2016). miR-539 inhibits prostate cancer progression by directly targeting SPAG5.J. Exp. Clin. Cancer Res.35:60. 10.1186/s13046-016-0337-8

  • 59

    ZhangP.WangD.ZhaoY.RenS.GaoK.YeZ.et al (2017). Intrinsic BET inhibitor resistance in SPOP-mutated prostate cancer is mediated by BET protein stabilization and AKT-mTORC1 activation.Nat. Med.231055–1062. 10.1038/nm.4379

  • 60

    ZhaoS. G.ChenW. S.LiH.FoyeA.ZhangM.SjöströmM.et al (2020). The DNA methylation landscape of advanced prostate cancer.Nat. Genet.52778–789. 10.1038/s41588-020-0648-8

  • 61

    ZouJ.HuangR.-Y.JiangF.-N.ChenD.-X.WangC.HanZ.-D.et al (2018). Overexpression of TPX2 is associated with progression and prognosis of prostate cancer.Oncol. Lett.162823–2832. 10.3892/ol.2018.9016

Summary

Keywords

prostate cancer, DNA methylation, predictive signature, prognosis, RND3, hypermethylation, systems biology

Citation

Zhang E, Shiori F, Mu OY, He J, Ge Y, Wu H, Zhang M and Song Y (2021) Establishment of Novel DNA Methylation-Based Prostate Cancer Subtypes and a Risk-Predicting Eight-Gene Signature. Front. Cell Dev. Biol. 9:639615. doi: 10.3389/fcell.2021.639615

Received

05 January 2021

Accepted

02 February 2021

Published

23 February 2021

Volume

9 - 2021

Edited by

Tao Huang, Shanghai Institute of Nutrition and Health (CAS), China

Reviewed by

Heejoon Chae, Sookmyung Women’s University, South Korea; Siyuan Han, The First Affiliated Hospital of China Medical University, China

Updates

Copyright

*Correspondence: Mo Zhang, Yongsheng Song, ;

This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Cell and Developmental Biology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics