Edited by: Ming Tang, Dana–Farber Cancer Institute, United States
Reviewed by: Yang Liu, Dana–Farber Cancer Institute, United States; Nathan Lack, Koç University, Turkey
†These authors have contributed equally to this work and share first authorship
‡These authors have contributed equally to this work
This article was submitted to Cancer Genetics and Oncogenomics, a section of the journal Frontiers in Genetics
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Prostate cancer (PCa) is the second most common malignancy in men, but its exact pathogenetic mechanisms remain unclear. This study explores the effect of enhancer RNAs (eRNAs) in PCa. Firstly, we screened eRNAs and eRNA -driven genes from The Cancer Genome Atlas (TCGA) database, which are related to the disease-free survival (DFS) of PCa patients;. screening methods included bootstrapping, Kaplan–Meier (KM) survival analysis, and Pearson correlation analysis. Then, a risk score model was established using multivariate Cox analysis, and the results were validated in three independent cohorts. Finally, we explored the function of eRNA-driven genes through enrichment analysis and analyzed drug sensitivity on datasets from the Genomics of Drug Sensitivity in Cancer database. We constructed and validated a robust prognostic gene signature involving three eRNA-driven genes namely
Prostate cancer (PCa) is the second most common malignancy in men (
Enhancer RNAs (eRNAs)—important components of long non-coding RNAs (lncRNAs)—are transcribed from enhancer regions and important in gene transcriptional regulation (
With the advent of high-throughput sequencing, it is possible to explore the function of eRNAs across the whole genome. Previous studies have conducted preliminary investigations of the functions of eRNAs. For example, Xiaolian et al. found that AP001056.1—a key immune-related eRNA in head and neck squamous cell carcinoma (SCCHN)—has a positive effect on clinical outcomes in patients with SCCHN (
The transcriptome data of PCa and paracancerous tissues, as well as clinical data of patients with PCa as per The Cancer Genome Atlas (TCGA) database, were obtained from the National Institutes of Health (NIH) Genomic Data Commons (GDC Data Portal
To preliminarily screen eRNAs related to the prognosis of patients with PCa, we analyzed the eRNAs and their expression levels along with their corresponding clinical disease-free survival (DFS) rates. Through application of bootstrapping methods and the Kaplan–Meier (KM) survival analysis, candidate eRNAs were narrowed down as follows: 70% of the samples were randomly selected from the TCGA data cohort for gene survival analysis. The process was repeated 1,000 times. Genes that appeared more than 700 times among the samples (robustness test,
For the training cohort from TCGA dataset, we adopted a least absolute shrinkage and selection operator (LASSO) regression analysis using the R package glmnet 4.0-2 (
risk score =
Due to the differences between the sequencing data and the chip data, the risk scores of the training set (TCGA) and the Validation sets (GSE55460, GSE70768, and MSKCC) were calculated, respectively, according to the above risk scoring calculation formula. Then, using the median value of their risk scores, we divided the training set and the Validation sets into high- and low-risk groups, respectively. We further used the survival data of the two groups of patients with PCa, to generate KM survival curves through a log-rank test (the model has predictive value when
We identified the differentially expressed genes (DEGs)in high- and low-risk groups from TCGA cohort using the R package limma 3.42.2. An absolute log2 fold change (| FC|) > 1 and an adjusted false discovery rate (FDR) < 0.05 were set as cutoff criteria. Using the R package limma 3.42.2, 609 DEGs were subjected to Gene Ontology (GO) enrichment analysis. Furthermore, Gene Set Enrichment Analysis (GSEA) was conducted using GSEA 3.0, per the methods specified in the user guide (gsea-3.0.jar
Using the R package survival 3.2-7, we performed univariate Cox regression analysis to appraise the predictive value of the risk score and other clinical factors for DFS in patients with PCa. Then, we conducted multivariate Cox regression analysis to remove confounders. The bilateral significance level was 0.05, and the hazard ratio (HR) and 95% confidence interval (CI) were calculated.
Combining the prognostic factors analyzed using the above methods, we used the R package rms 6.1-0 to construct a nomogram. Using the R package pec v2019.11.3 (
We used R package pRRophetic to compare half maximal inhibitory concentrations (IC50) of two drugs (docetaxel, bicalutamide) used in treatment of PCa, as per the Genomics of Drug Sensitivity in Cancer database. Results were recorded for, and compared between, the high- and low-risk groups. In addition, we used the Connectivity Map (CMap)
All statistical analyses were performed using R software (version 3.6.3). The Mann–Whitney
Clinical information from the 545 PCa patients of TCGA.
Clinical parameters | Variable | Number (total) | Percentage | Hazard ratio |
Age (years) | ≤60 | 242 | 44.40% | 1.338 (0.874–2.047) |
>60 | 303 | 55.60% | ||
Survival status | Dead | 10 | 1.83% | Not applicable |
Alive | 482 | 88.44% | ||
Unknown | 53 | 9.72% | ||
Gleason score | 6 | 48 | 8.81% | 3.907 (2.471–6.178) |
7 | 285 | 52.29% | ||
8 | 66 | 12.11% | ||
9 and 10 | 146 | 26.78% | ||
T grade | T2 | 188 | 34.50% | 4.018 (2.224–7.256) |
T3 | 295 | 54.13% | ||
T4 | 10 | 1.83% | ||
Unknown | 52 | 9.54% | ||
N grade | N0 | 348 | 63.85% | 1.820 (1.112–2.980) |
N1 | 79 | 14.50% | ||
Unknown | 118 | 21.65% |
In order to verify that the Bootstrap method is more rational, we set six nodes (20, 50, 100, 250, 500, and 1,000) and compared the results at each node as shown in
Through comparative examinations, we screened 26 regulatory relationships and performed correlation analysis of these relationships (cor > 0.3,
List of eRNAs and eRNAs-driven genes associated with overall survival derived from enhancers.
Ensembl ID | Symbol | TANRIC Overall Survival Analysis, Log-Rank |
Target gene | Correlation between lncRNA and the Neighboring Target | |
Correlation coefficient | |||||
ENSG00000223959 | AFG3L1P | 0.004207 | FANCA | 0.418196 | |
ENSG00000223959 | AFG3L1P | 0.004207 | MC1R | 0.70217 | |
ENSG00000223959 | AFG3L1P | 0.004207 | SPIRE2 | 0.556318 | |
ENSG00000161912 | ADCY10P1 | 4.72E-05 | UNC5CL | 0.64206 | |
ENSG00000206417 | H1FX-AS1 | 7.75E-06 | RPL32P3 | 0.372866 | |
ENSG00000242258 | LINC00996 | 0.002828 | GIMAP4 | 0.678696 | |
ENSG00000254812 | AC067930.3 | 0.006284 | MAPK15 | 0.435871 | |
ENSG00000197558 | SSPOP | 0.000238 | ZNF467 | 0.603343 | |
ENSG00000227297 | SSPOP | 0.000863 | ZNF862 | 0.68106 |
The nine target genes were filtered using LASSO analysis. The filter condition was set to repeat 1,000 times (
Development and validation of the risk model.
Risk genes in the prognostic risk model.
Gene | Coef | HR | HR.95L | HR.95H | |
MAPK15 | 0.186338 | 1.204829 | 0.977376 | 1.485215 | 0.080882 |
ZNF467 | 0.523366 | 1.687699 | 1.345957 | 2.116209 | 5.80E-06 |
MC1R | 0.624495 | 1.867304 | 1.332502 | 2.616749 | 0.000286 |
The AUC values for 1-, 3-, and 5-year survival were 0.710, 0.748, and 0.755, respectively (
We also analyzed the performance of individual model genes in the training set and showed that the AUC values for 1-, 3-, and 5-year survival of MAPK15 were 0.671, 0.718, and 0.731, respectively (
The prognostic risk model was further validated in the GEO datasets GSE54460, GSE70768, and MSKCC. The clinical information of the validation cohort is summarized in
Clinical recurrence rate in TCGA, GES54460, GES70768, and MKSCC database.
Dataset | Clinical parameter | |||||
Recurrence | No recurrence | |||||
1 year | 3 years | 5 years | 1 year | 3 years | 5 years | |
TCGA ( |
29 (5.93%) | 67 (17.30%) | 81 (16.56%) | 460 (94.07%) | 422 (82.70%) | 408 (83.44%) |
GES54460 ( |
24 (22.64%) | 45 (42.45%) | 51 (48.11%) | 82 (77.36) | 61 (57.55%) | 55 (51.89%) |
GES70768 ( |
8 (7.20%) | 16 (14.41%) | 19 (17.11%) | 103 (92.79%) | 95 (85.59%) | 92 (82.89%) |
MKSCC ( |
15 (11.63%) | 28 (21.71%) | 31 (24.03%) | 114 (88.37%) | 101 (78.29%) | 98 (75.97%) |
To explore the single predictive value of the three-gene signature, we visualized the differential expression of each gene in samples with different clinical features. These included normal tissue and tumor samples, samples with T- and N-category characteristics, and samples differentiated according to their Gleason score and risk (TCGA dataset). Among these groups, those involving tumor samples, specifically category T3/T4 tumors, with N1 nodal status, high-risk, and Gleason score > 7 exhibited the highest gene expression levels (
The differential analysis of three risk model genes between PRAD tissue and normal prostate tissue in TCGA
We screened clinical variables with independent prognostic value using univariate and multivariate Cox regression analyses (
By comparing the gene expression levels in samples from the high- and low-risk groups of the training cohort, 609 DEGs (| FC| ≥ 1, adjusted
GSEA was used to explore the potential signaling pathways that influence the risk score model in the high- and low-risk groups (
Based on the GO enrichment analysis results, the genes included in the risk model may affect the occurrence and development of tumors, through their impact on certain immune-related functions such as immunoglobulin complex. Therefore, we compared the tumor microenvironments (TMEs) in the high- and low-risk groups.
We calculated the single-sample GSEA (ssGSEA) scores of EMT, TGF-β, and ECM to compare the differences between the high- and low-risk groups, based on stromal cells and showed that the ECM and TGF-β scores of the low-risk group were higher than those of the high-risk group. No significant difference was seen in the EMT score between the high- and low-risk groups (
We further explored immune cell infiltration to determine the immune status of the high- and low-risk groups and showed that CD4+ T cell and CD8+ T cell counts differed significantly between the two groups (
We also compared the levels of expression of 14 immune checkpoints between the high- and low-risk groups and showed that the expression of nine of these immune checkpoints (LAG3, CD80, CD276, PDCD1, IL6, CD4, TGFB1, CD86, CTLA4) were significantly different between the high- and low-risk groups (
Based on the GSEA results, base excision repair was associated with the heterogeneity between the high- and low-risk groups. We then explored the differences in the DNA mutation burden, tumor mutational burden (TMB), and microsatellite instability (MSI), between the two groups.
By analyzing the 20 genes with the highest mutation frequency, we found a marked difference in mutation burden between the high- and low-risk groups. The genes with the highest mutational frequency in the high- and low-risk groups, were TP53 and SPOP, respectively (
Association between high-low risk groups and DNA mutation.
Comparing the differences in TMB between the high- and low-risk groups showed that the TMB of the high-risk group was significantly different to that of the low-risk group (
Differential analysis of the MSI of each PRAD sample obtained from a previous study (
We assessed differences in drug sensitivity between the high- and low-risk groups by analyzing the IC50 of two chemotherapeutic agents and showed that, among the drugs for which significantly different sensitivities were perceived between the high- and low-risk groups, patients in the high-risk group were more sensitive to docetaxel. Conversely, patients in the low-risk group were comparatively more sensitive to bicalutamide (
Drug sensibility analysis with risk model. Differences in the IC50 of five drugs [bicalutamide
Genomic and epigenomic changes in tumor cells are associated with certain tumor factors, including cellular proliferation and oncogenic transformation (
Through functional analysis, we found that the eRNAs and corresponding eRNA-driven genes may affect tumor development through certain immune processes. Previous studies have shown that immune cells are important in hindering tumor progression, and that the TME plays a key role in the occurrence and development of PCa and other cancers (
The results of the functional analysis indicated that the eRNAs and corresponding eRNA-driven genes also influence the progression of PCa by affecting the base excision repairp rocess. We compared the tumor mutation spectrum, TMB, and MSI between the high- and low-risk groups and showed that the gene with the highest mutation frequency in the high-risk group was TP53, whereas in the low-risk group it was SPOP. TP53 mutations, which can significantly promote tumor growth, are consequentially associated with the development of PCa (
Based on the outcomes of our study, our method of constructing a risk score model is more reliable in producing a robust model than existing methods, and that our study was well-founded as the mechanisms by which eRNAs affect the occurrence and development of PCa, as well as their involvement in potential therapeutic options, were explored on many levels and through various methods. However, our study has some limitations. Firstly, the data were not from our own database but rather retrieved from public databases. Secondly, the mechanisms of action of the eRNAs and target genes in PCa, were not further investigated through
In summary, we used expression profiles from public databases to screen genes closely associated with the prognosis of patients with PCa and established a prognostic risk model, which was validated in multiple datasets. Differential analyses showed significant differences in TME, TMB, and MSI between the high- and low-risk groups. The genes included in the three-gene signature may affect tumor progression, by impacting the aforementioned factors. Therefore, the risk score derived from our model is a good predictor of the prognosis of patients with PCa and can be used to explore how eRNAs affect the occurrence and development of PCa. Drug sensitivity analysis showed that respondents in the high- and low-risk groups differed in their sensitivity to several chemotherapeutic agents used to treat PCa. Our model could predict drug sensitivity, and several candidate drugs that may inhibit the progression of PCa were identified. These findings can help to design personalized medication regimes for patients and exploring more effective drug options for PCa treatment.
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/
SF, ZW, and LZ designed the study, analyzed the data, revised the images, performed the literature search, and collected data for the manuscript. CZ, JW, and DY revised the manuscript. All authors contributed to the article and approved the submitted version.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The authors would like to thank all the patients who participated in the study.
The Supplementary Material for this article can be found online at:
Expression relationships of six eRNAs and corresponding eRNAs-driven Genes.
KM survival curves for six eRNAs and nine target genes.
Correlation analysis of three model genes’ copy number variation regions with target genes [MAPK15
Time-dependent ROC curve analysis of the model genes [MAPK15
The correlation of riskscore and Gleason score (≤7, >7)
The rows and columns indicate the genes and tumor samples, respectively. Enrichment plots of the top five KEGG pathways in the high-risk score
Analysis of the difference of immune infiltration between the two algorithms of epic
In 1,000 iterations, eRNAs and target genes appeared more than or equal to 700 times.
Functional markers of nine eRNAs-driven genes.
Clinical factors were analyzed by univariate and multivariate Cox.
130 GO terms of differential expression genes.
GSEA results of the KEGG pathways in the high-risk score and low-risk score groups in PRAD.
10 Important Small Molecule Drugs that may reverse tumor status of PCa.
Prostate cancer
Enhancer RNAs
Long non-coding RNAs
head and neck squamous cell carcinoma
Gene ontology
Gene Set Enrichment Analysis
differentially expressed genes
The Cancer Genome Atlas
National Institutes of Health
Genomic Data Commons
RNA sequencing
Gene Expression Omnibus
Tumor Immune Estimation Resource 2.0
prostate adenocarcinoma
Genomics of Drug Sensitivity in Cancer
Epithelial-mesenchymal transformation
transforming growth factor β
extracellular matrix
Molecular Signatures Database
disease-free survival
Kaplan–Meier
least absolute shrinkage and selection operator
receiver operator characteristic
area under the curve
false discovery rate
fold change
concordance index
tumor microenvironment
tumor mutational burden
microsatellite instability
half-inhibitory concentration.