Edited by: Michael Liebman, IPQ Analytics, United States
Reviewed by: Quan Cheng, Central South University, China; Lu Xie, Shanghai Center for Bioinformation Technology, China
This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Cell and Developmental Biology
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Bladder cancer (BLCA) is one of the most common cancers worldwide. In a large proportion of BLCA patients, disease recurs and/or progress after resection, which remains a major clinical issue in BLCA management. Therefore, it is vital to identify prognostic biomarkers for treatment stratification. We investigated the efficiency of CpG methylation for the potential to be a prognostic biomarker for patients with BLCA.
Overall, 357 BLCA patients from The Cancer Genome Atlas (TCGA) were randomly separated into the training and internal validation cohorts. Least absolute shrinkage and selector operation (LASSO) and support vector machine-recursive feature elimination (SVM-RFE) were used to select candidate CpGs and build the methylation risk score model, which was validated for its prognostic value in the validation cohort by Kaplan–Meier analysis. Hazard curves were generated to reveal the risk nodes throughout the follow-up. Gene Set Enrichment Analysis (GSEA) was used to reveal the potential biological pathways associated with the methylation model. Quantitative real-time polymerase chain reaction (PCR) and western blotting were performed to verify the expression level of the methylated genes.
After incorporating the CpGs obtained by the two algorithms, CpG methylation of eight genes corresponding to TNFAIP8L3, KRTDAP, APC, ZC3H3, COL9A2, SLCO4A1, POU3F3, and ADARB2 were prominent candidate predictors in establishing a methylation risk score for BLCA (MRSB), which was used to divide the patients into high- and low-risk progression groups (
We developed the MRSB, an eight-gene-based methylation signature, which has great potential to be used to predict the post-surgery progression risk of BLCA.
Bladder cancer (BLCA) is one of the most common cancers. Seventy percent of cases present as non-muscle invasive lesions (NMIBCs), and approximately 25–75% of high-risk NMIBC patients progress to muscle invasive cancer (MIBC) and then to metastatic cancer (
Because alterations in aberrant methylation are relatively stable and may be reversible therapeutically, considerable attention has been focused on them recently (
In this study, we successfully identified and validated progression-related CpGs in BLCA. Here, we analyzed DNA methylation data from 450K chips from The Cancer Genome Atlas (TCGA)-BLCA database by utilizing machine learning and built a predictive model from the methylation risk score for BLCA (MRSB) with eight specific CpGs for predicting the PFS of BLCA patients. We revealed the time node of adverse events after resection, thus allowing for more efficient treatment of patients to prevent a poor outcome of high-risk patients. We further demonstrated that the mRNA and protein levels of the MRSB component-related gene TNFAIP8L3 were prominently upregulated in BLCA tissues compared to adjacent tissues. In short, our study identified a prognostic panel, which provides novel insight into cancer progression and the opportunity of stratified therapeutic strategy for patients with BLCA.
Paired cancer and adjacent tissue samples from 18 patients were collected between July 2017 and June 2019 at the First Clinical Hospital of Zhengzhou University (ZZU cohort). None of the patients had previously received any special treatments. The project was approved by the Ethics Committee of Zhengzhou University, and all patients signed informed consent forms. Patient tissues were stored in liquid nitrogen until they were used for the detection of mRNA and protein expression levels.
The research protocol is illustrated in
Data generation and analysis process of this study. The differentially presented CpGs between cancer and normal tissues in the TCGA bladder cancer cohort were firstly identified. After excluding competitive event patients, SVM-RFE and LASSO algorithms were used to identify candidate methylation sites and to incorporate the results. Multivariate Cox analysis was performed to establish the prognostic model: the MRSB was validated in the internal validation cohort. Finally, nomograms were established with MRSB and clinical covariates. TCGA, The Cancer Genome Atlas; SVM-RFE, support vector machine-recursive feature elimination; LASSO, least absolute shrinkage and selector operation; BLCA, bladder cancer; MRSB, methylation risk score for bladder cancer.
The differentially expressed CpG sites between BLCA patients and adjacent normal tissue were selected using the “limma” package (
CpG sites conforming to the criteria described above were used to participate in machine learning. “Glmnet” R packages were utilized to implement the least absolute shrinkage and selector operation (LASSO) algorithm (
Two algorithms were used for selecting the candidate CpGs.
All possible stepwise increases in the amounts of candidate CpGs were tested from one to eight signatures to obtain the best classification accuracy of patients in the high- and low-risk groups (
Building the methylation risk score for bladder cancer (MRSB) in the training cohort.
Decision curve analysis was executed to select the optimal clinical variables to incorporate for constructing a nomogram (
TRIzol reagent (Invitrogen, United States) was utilized to extract total RNA from the tissues, and it was reverse transcribed following the manufacturer’s protocol (Takara Bio, Japan). Real-time quantitative polymerase chain reaction (RT-qPCR) assays were performed using the PowerUp SYBR-Green master mix kit (Thermo Fisher Scientific, United States) and the QuantStudio 6 System (Thermo Fisher Scientific, United States). The following primers were used: TNFAIP8L3 forward primer: ATTGATGACACCAGCAGCGA; reverse primer: GAGGAACTCCACATCGGCAA. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) forward primer: GACCTGACCTGCCGCCTA; reverse primer: AGGAGTGG GTGTCGCTGT. mRNA expression was normalized to GAPDH, and the data were analyzed using the comparative Ct method (2−ΔΔ
Total protein was extracted utilizing radioimmunoprecipitation assay (RIPA) buffer from the tissues of the ZZU BLCA patients. Following the extraction, bicinchoninic acid (BCA) assays (Beyotime, China) were performed to quantify all proteins. Equal amounts of protein samples were separated by 10% sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and then transferred to polyvinylidene difluoride (PVDF) membranes (Millipore, United States). The membranes were blocked with 5% non-fat milk/TBST for 2 h. Then, the membranes were incubated with primary antibodies at 4°C overnight with the following antibodies: anti-TNFAIP8L3 (1:1,000; Invitrogen, United States) and anti-GAPDH (1:10,000; Proteintech, United States). After washing the membranes with Tris-buffered saline Tween-20 (TBST) three times, the membranes were further incubated with secondary antibodies [alkaline phosphatase-conjugated AffiniPure goat anti-rabbit IgG (H+L) (1:10,000; Proteintech, United States) or alkaline phosphatase-conjugated AffiniPure goat anti-mouse IgG (H+L) (1:10,000; Proteintech, United States)] for 2 h at 37°C. The immunoreactive bands were visualized using an enhanced chemiluminiscence (ECL) system (FluorChem E; ProteinSimple, United States).
Gene Set Enrichment Analysis (GSEA) was executed to identify potential biological pathways/processes affected by the MRSB DNA methylation model and corresponding genes (
All statistical analyses were executed in R software (Version 3.6.4). The “survival” R package was executed to perform the Kaplan–Meier survival analysis and log-rank test. PFS was measured as the time when patients lived with the disease during which it did not worsen until the last follow-up after treatment. OS was defined as the date of diagnosis or the start of treatment to death or the last follow-up. Recurrence-free survival (RFS) was defined as the time from treatment until disease recurrence, metastasis, or last follow-up. Statistical significance was defined as
Following the protocol illustrated in
Both SVM-RFE and iterative LASSO were used to identify the most significant CpGs for classifying patients into high- and low-risk progression in the training group. A total of 130 CpGs (
The TCGA were divided randomly into training (250 patients, 70%) and internal validation (107 patients, 30%) groups using a random allocation sequence, and there were no significant differences in progression risk status, age, sex, cancer stage, and grade between the training and validation cohorts (
To validate the effectiveness of the MRSB, we performed validation analysis in the internal validation cohort (
The performance of MRSB in predicting progression, survival and recurrence.
A kernel-smoothing hazard rate function was used to reveal the time to cancer progression. The risk increased steeply toward the first peak at approximately 9–10 months after treatment for the MRSB high-risk group, and the second peak occurred at approximately 30 months after resection; however, there was no noteworthy peak for the MRSB low-risk group during the follow-up period (
Hazard curves revealing the time of cancer progression. Smoothed hazard estimates for the presence of a risk component in
To establish a clinical valuable prognostic biomarker based on our MRSB to predict the individual risk of disease progression, we developed a predictive model by combining MRSB and common clinical covariates using a nomogram. Based on the decision curve, we found that pathologic tumor stage was a better evaluation factor than histological grade (
The nomograms based on MRSB are reliable predictors for the prognosis of BLCA patients.
Methylation risk score for BLCA (MRSB) component-related genes were defined as the genes at which the probe closest to the transcription start site (TSS) was located based on the university of California Santa Cruz (UCSC) genome browser known-gene list. To evaluate the potential contribution of the MRSB component-related genes in BLCA progression, we analyzed the RNA-seq data of those genes in BLCA samples in correlation with patient PFS and OS in the TCGA cohort (
Expression patterns of MRSB component-related genes.
We used GSEA to explore the biological effects mediated by the methylation model and the corresponding genes. We found that several gene sets associated with tumor progression were enriched, and the associated genes were upregulated in the high-risk group (
Cellular pathways/processes affected by MRSB methylation model.
In this study, we identified genes with methylated CpGs which were associated with post-surgical treatment BLCA progression based on the analysis of BLCA TCGA DNA methylation data using two computational analysis algorithms, the SVM-RFE and iterative LASSO algorithms. Consequently, eight genes with specific CpG methylations were selected to build a MRSB in the training cohort, which was validated in the validation cohort. We demonstrated that MRSB was able to classify BLCA patients into high- or low-risk disease progression subgroups. Moreover, our data also showed that MRSB could predict the risks of disease recurrence and patient OS. Therefore, MRSB has great potential to be used to predict the post-surgery progression risk of BLCA, and it may provide novel insight into BLCA progression and the opportunity of stratified therapeutic strategy.
On the basis of MRSB, low-risk patients can avoid the toxic adverse effects of adjuvant treatment, while high-risk patients will be selected to receive active surveillance and intensified regimens to prevent tumor progression (
Recently, three genome-wide studies have reported DNA methylation in BLCA. In the first study, four specific methylation regions were identified to predict the progression potential of NMIBC to MIBC by analyzing 192 patients with primary pTaG1/G2 BLCA. The area under the curve for GATA binding protein 2 (GATA2) was 0.803, for T-Box transcription factor 2 (TBX2) was 0.644, for T-Box transcription factor 3 (TBX3) was 0.785, and for Zic family member 4 (ZIC4) was 0.692, respectively (
Our study has the following advantages over the previous studies. We adopt a strategy to select reliable markers by combining the results of two different algorithms, which as much as possible minimized the loss or neglect of important markers compared with the method of using only a single strategy as in previous studies. The main statistical concern faced when methylation data are used to develop prognostic models is the processing of large quantities of markers yielded from ultrahigh-dimensional data. Overfitting of the overly vast and complex methylation signal model in the face of limited heterogeneity of the training cohort compromises the independent predictive efficacy of the model (
More importantly, we established a nomogram with MRSB, age, sex, and tumor clinical stage to predict individual progression risk. The nomogram was reliable for predicting survival and recurrence risk. Therefore, our nomogram provides a potentially accurate prognostic indicator for patients with BLCA, which may be used to guide individualized post-surgery disease progression monitoring and prevention strategies.
Moreover, we found that high-level expression of the MRSB component-relevant genes TNFAIP8L3 and APC, which negatively correlated with their methylation status, was correlated with a poor prognosis. Generally, tumorigenesis is influenced by transcriptional activation of oncogenes
TNFAIP8L3 has been shown to promote the progression of gastric cancer, which can be suppressed by miR-9-5p (
In colorectal cancer, APC is a well-established tumor suppressor, and its inactivation is a common mechanism of colorectal tumorigenesis. APC mutation has been associated with the activation of Wnt/β-catenin signaling pathway (
Our study has the following limitations. First, our study collected patients with different disease stages. Whether MRSB is affected by heterogeneity among patients with early or advanced stage BLCA requires further investigation. Second, the biological mechanisms of the involvement of certain methylated genes, such as KRTDAP, ZC3H3, and PI3, are yet to be investigated. Third, because the TCGA patients were mainly from the United States and most of the samples were mainly from Caucasians, independent external validation on more diverse patient populations is necessary for the global application of MRSB developed in this study. Our results warrant further investigation in a larger independent cohort.
In conclusion, MRSB, an eight-genes-based DNA methylation signature, is an efficient prognostic biomarker to predict the progression risk of BLCA patients. The nomogram including MRSB may provide individualized BLCA patient monitoring and prevention strategies. Our study not only indicates the potential value of MRSB as a prognostic predictor in BLCA but also points to a novel direction for further mechanistic research of BLCA progression.
Publicly available datasets were analyzed in this study. The raw data of DNA methylation and gene expression data in this study are available in TCGA-BLCA program (
The studies involving human participants were reviewed and approved by the Ethics Committee of Zhengzhou University. The patients/participants provided their written informed consent to participate in this study.
DS, Y-JL, LZ, and YFG performed the conception and design. YDG, PC, YC, and CH performed the data collection and processing. YFG and Y-JL performed the data analysis and interpretation. JY, YD, and YFG performed the experiments. YFG, JY, Y-JL, LZ, and DS performed the manuscript writing. All authors discussed the results and reviewed the manuscript.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The raw data and clinical data used in this study are based upon TCGA Research Network:
The Supplementary Material for this article can be found online at:
Net decision curve analyses demonstrating the benefit for the MRSB and the optimal clinical covariates for prognosis.
The nomograms for overall survival based on the MRSB.
The nomograms for recurrence-free survival based on the MRSB.
The expression of the MRSB component-related genes was associated with the prognosis in BLCA patients.
One hundred thirty CpGs were identified by the SVM-RFE algorithm.
Fifteen CpGs that appeared more than 500 times in 1,000 iterations and were considered consensus CpGs that distinguished high- from low-risk groups were identified by the iterative LASSO algorithm.
Gene set enrichment analysis (GSEA) results of MRSB corresponding genes in TCGA BLCA cohort.