Large Multicohort Study Reveals a Prostate Cancer Susceptibility Allele at 5p15 Regulating TERT via Androgen Signaling-Orchestrated Chromatin Binding of E2F1 and MYC

Aberrant telomerase reverse transcriptase (TERT) expression is crucial for tumor survival and cancer cells escaping apoptosis. Multiple TERT-locus variants at 5p15 have been discovered in association with cancer risk, yet the underlying mechanisms and clinical impacts remain unclear. Here, our association studies showed that the TERT promoter variant rs2853669 confers a risk of prostate cancer (PCa) in different ethnic groups. Further functional investigation revealed that the allele-specific binding of MYC and E2F1 at TERT promoter variant rs2853669 associates with elevated level of TERT in PCa. Mechanistically, androgen stimulations promoted the binding of MYC to allele T of rs2853669, thereby activating TERT, whereas hormone deprivations enhanced E2F1 binding at allele C of rs2853669, thus upregulating TERT expression. Notably, E2F1 could cooperate with AR signaling to regulate MYC expression. Clinical data demonstrated synergistic effects of MYC/E2F1/TERT expression or with the TT and CC genotype of rs2853669 on PCa prognosis and severity. Strikingly, single-nucleotide editing assays showed that the CC genotype of rs2853669 obviously promotes epithelial–mesenchymal transition (EMT) and the development of castration-resistant PCa (CRPC), confirmed by unbiased global transcriptome profiling. Our findings thus provided compelling evidence for understanding the roles of noncoding variations coordinated with androgen signaling and oncogenic transcription factors in mis-regulating TERT expression and driving PCa.


INTRODUCTION
Prostate cancer (PCa) is one of the leading causes of cancer deaths and a highly heritable cancer in men (1). Genetic heritability was estimated to account for 57% of familial risk in PCa (1). Thus, identification of genetic loci in association with PCa risk and pathogenesis, and illustration of the underlying mechanisms are expected to have substantial influence on our understanding of PCa and on the prevention and treatment of the disease. Thanks mainly to genome-wide association studies (GWAS), a substantial amount of singlenucleotide polymorphism (SNP) loci have thus far been identified to be significantly associated with PCa (2). To date, even though the detailed functions of several PCa risk SNPs have been unveiled, the regulatory mechanisms underlying many SNPs remain unidentified. Focusing our research scope on PCa, important oncogene-associated risk SNPs might be a way to speed up our understanding of the functions and the risk of SNPs.
The human telomeres are a safeguard of chromosome ends, whose main function is the maintenance of telomeric DNA length and chromosomal stability (3,4). Most malignant cells in cancer, including PCa, achieve unlimited replicative capacity, a hallmark of cancer (5), through activating telomerase for telomere maintenance (6,7). The key rate-limiting element for telomerase activity was found to be TERT, encoding an essential catalytic subunit of telomerase (8)(9)(10) that is aberrantly expressed in many types of cancer. Understanding the mechanisms of aberrant TERT expression is a fundamental question for human cancer, and TERT is also a potential clinical target for improving cancer diagnosis and prognosis (11)(12)(13). At present, a variety of transcription factors have been found to regulate the transcription of TERT by directly acting on the promoter region of TERT or indirectly in the form of complexes. There are two canonical E-box consensus sites at the hTERT core promoter of MYC family transcription factors, which are located at upstream 165 bp and downstream 45 bp relative to the hTERT transcriptional start site (TSS). To date, the mechanisms underlying high TERT expression in PCa are still not fully investigated.
In recent years, large-scale genome-wide association studies (GWAS) have made important contributions to the identification of common mutations at TERT sites. Many studies have confirmed that somatic mutations and functional SNPs of TERT genes are associated with multiple cancer risks. For example, rs2736100 was found to be associated with high risk of glioma, lung adenocarcinoma, and testicular germ cell cancer; rs402710 was found to be associated with lung cancer; and T allele of rs10069690 increased the risk of estrogen-receptor-negative breast cancer. rs2242652, rs7725218, rs2853677, and rs401681 have been reported to be associated with PCa risk. These four SNPs are located at the intron or 3'UTR region of TERT. To further validate whether there are SNPs located at the promoter region of TERT associated with PCa risk, we focused our research on the SNPs located around TERT.

Study Population, SNP Genotyping, and Quality Control
A total of 209,741 male participants with Caucasian ancestry from UK Biobank (release V3) with GWAS genotyping array data and imputation information were included in the present study. A detailed description of the population of UK Biobank was reported elsewhere (14). Briefly, this is a prospective cohort with genetic and phenotypic data from 500,000 individuals across UK between 40 and 69 years at the age of recruitment. A PCa diagnosis was identified as having a record of PCa from national cancer registries and self-report based on ICD-10 code (C61) before or after recruitment in patients with European Ancestry. The current study was based on the update of the database in December 2020. Genome-wide genotype data with imputation and quality control (QC) were provided by UK Biobank. Genotype information of rs2853669 and its nearby region (± 50 kb) was then obtained from the dataset for further analysis.
An additional independent population with Chinese Han ancestries was also included in the present study for further validation. It included 2,510 male patients from a biopsy cohort based on Chinese Han population (1,100 PCa patients vs. 1,410 healthy men as controls). Clinical information was collected including diagnosis of PCa, Gleason Score, and prostatespecific antigen. SNPs were genotyped using the Illumina Asian Screening Array (ASA) for Chinese patients. Subjects were removed from subsequent analyses if they met any of the following criteria: (1) overall genotype call rate <90%; (2) duplicates or familial relationship (PI_HAT > 0.025). Genotypic QC of rs2853669 was also performed (1) if the genotype call rate >95% in the three populations; and (2) if it passes a Hardy-Weinberg Equilibrium test (p > 1 × 10 −3 ). Finally, this SNP could be further evaluated in these populations.

Association Study and Fine Mapping
Logistic regression models including additive model, dominant model, and recessive model were used to evaluate the association between the SNPs and different phenotypes of PCa performing by using Plink 1.09. Cox regression survival analyses were performed using R (3.6.3). A two-tailed p < 0.05 was considered statistically significant. Fine mapping for the ±50kb region of rs2853669 was performed based on the association results from UK Biobank using LocusZoom (http://locuszoom. sph.umich.edu/). The linkage disequilibrium (LD) in this region was evaluated by LDlink (15).

Cell Culture
All cell lines were originally purchased from the American Type Culture Collection (ATCC), and none of these cell lines were found to be contaminated with mycoplasma during our study. HEK293T cells were grown in DMEM (Invitrogen, catalog no. 10569010); LNCaP and 22Rv1 cells were grown in RPMI1640 (Merck, catalog no. R8758). DMEM and RPMI1640 were supplemented with 10% fetal bovine serum (Gibco, catalog no. 10099-141C) and antibiotics (Thermo Fisher Scientific, catalog no. 15140122). RWPE1 cells were grown in Keratinocyte-Serum Free Medium (10724-011, Gibco) with 0.05 µg/ml epidermal growth factor (PHG0311, Gibco), 0.05 mg/ml bovine pituitary extract (02-104, Merck), and antibiotics. In order to study AR activity, we cultured the LNCaP and 22Rv1 cells in charcoal (Merck, catalog no. C6241-5G) stripping media up to at least 48 h. AR activity was induced by treating cells with 100 nM dihydrotestosterone (DHT; dissolved in methanol, Merck, catalog no. D-077-1ML).

Cell Viability and Proliferation Assays
Mutated 22Rv1 cells treated with DHT or ethanol (2 × 10 3 per well) were seeded in 96-well plates. Cell viability and proliferation were determined with XTT assays (Roche, 11465015001) at designed time points by measuring the absorbance at 450 nm, following the manufacturer's instructions. Values were obtained from three replicate wells for each treatment and time point. The results were representative of three independent experiments. Significance was calculated by a two-tailed t test.

Invasion Assay
Cells were detached by trypsinization and resuspended in serum free charcoal stripping media growth medium at 2.5 × 10 5 cells/ ml. Two hundred microliters of cell suspension with or without androgen treatment was transferred into 8-mm Transwell inserts (Corning Costar, catalog no. 3422) with 100 µl of Matrigel (diluted with serum free medium to 250µg/ml) coating (BD Biosciences, catalog no.356230). Charcoal stripping media growth medium (750 µl) with or without androgen was added in the lower chambers. After 48 h, 3.7% formaldehyde was used to fix the cells; after being permeabilized with methanol, cells were stained with Wright-Giemsa (WG16-500ml, Sigma). Cells on the upper surface of the membranes were removed and the cells at the bottom surface of the filters were quantified by counting the number of cells that penetrated the membrane in five microscopic fields (acquired at 20× magnification) per membrane. A two-tailed t-test was employed to perform statistical analysis from three replicate inserts.

Chromatin Immunoprecipitation
The ChIP experiment was performed as previously described (16). Briefly, cells were crosslinked with 1% formaldehyde (Merck, catalog no. F8775) and the reaction was stopped with 125 mM glycine (Merck, catalog no. G8898-1KG). The nuclei were isolated from cells and then suspended in SDS lysis buffer. Nuclear extracts were sonicated to generate an average size of 400-bp chromatin fragment. In each reaction, 6 µg of indicated antibodies (E2F1, Invitrogen, catalog no. 32-1400; MYC, Invitrogen, catalog no. MA1-980) or control IgG (ABclonal, catalog no. AC011) was incubated with 70 µl of Dynabead protein G (10004D, Invitrogen) slurry in IP buffer for 10 h. Then, the supernatant was removed, and fragmented chromatin was diluted in 1.3 ml of IP buffer, which was added onto bead/ antibody complexes. After 12 h of incubation, the complex was washed once with wash buffer I (20 mM Tris-HCl, pH 8.0, with 2 mM EDTA, 0.1% SDS, 1% Triton X-100, and 150 mM NaCl) and one time with buffer II (20 mM Tris-HCl, pH 8.0, with 2 mM EDTA, 0.1% SDS, 1% Triton X-100, and 500 mM NaCl), followed by two washes with buffer III (10 mM Tris-HCl, pH 8.0, with 1 mM EDTA, 250 mM LiCl, 1% Deoxycholate, and 1% NP-40) and buffer IV (10 mM Tris-HCl, pH 8.0, and 1 mM EDTA). DNA was purified with MinElute PCR Purification Kit (Qiagen, catalog no. 28006) and the target DNA fragments were analyzed by qPCR or direct Sanger sequencing of PCR fragments harboring these SNPs.

Quantitative PCR
Each target fragment was amplified using SYBR Select Master Mix (Applied Biosystems, catalog no. 4472908). All target primers had three technical replicates, and the data were normalized to the control regions; then, the relative enrichment of the target antibodies at target DNA fragment was determined by comparison with the IgG control. Primer sequences used in this experiment can be found in Supplementary Table 2.

Wound Healing Assays
Cells were seeded into 24-well plates and allowed to grow to near confluency. P200 pipette tip was used to scrape the cells and the same field was imaged after 72 h. To induce AR activity, the cells were treated with 100 nM DHT. The area of the wound in each well was analyzed using ImageJ. Results are representative of three independent experiments. Statistical significance was calculated by the two-tailed t test.

Single-Nucleotide Mutation Using CRISPR/Cas9
The experiment was performed according to the previous protocol (17). Briefly, one pair of sgRNA (sgRNA1-top: CACCGCCAGGACCGCGCTTCCCACG, sgRNA1-bottom: AAACCGTGGGAAGCGCGGTCCTGGC) was designed (https://portals.broadinstitute.org/gpp/public/analysis-tools/ sgrna-design) and inserted into pSpCas9 (BB)-2A-Puro (PX459) (Addgene plasmid ID: 48139). rs2853669 (T or C) centered DNA fragments were cloned into pGL3 basic vector to generate repair templates (Supplementary Table 1). Three hundred nanograms of indicated Cas9 plasmid [pSpCas9 (sgRNA) with sgRNA] and 300 ng of pGL3 vector with repair templates were co-transfected into 22Rv1 cells using Lipofectamine 2000. Medium was changed 24 h later. Puromycin (0.8 µg/ml) (Merck, catalog no. P9620) was added onto transfected cells after 48 h. After non-transfected cells were killed by puromycin, the remaining cells were sorted using flow cytometry to establish single-cell clones. The single cells were seeded in 96-well plates and checked for 9-14 days to rule out the non-single clone. Finally, the single clones were picked up for subculture and genotyping.

Electrophoretic Mobility Shift Assays
T h e e x p e r i m e n t w a s p e r f o r m e d w i t h L i g h t S h i f t ® Chemiluminescent EMSA kit (Thermo Fisher Scientific, catalog no. 20148), following the manufacturer's instructions. Full length of E2F1 or MYC was cloned into vector pcDNA3.1/V5-HisA (Invitrogen, catalog no. V81020). Briefly, double-stranded biotin-labeled consensus DNA was incubated with HEK293T cell nuclear extract with ectopically expressed E2F1 or MYC in a 1× binding buffer, 50 ng/µl poly(dI:dC)·poly(dI:dC), 2.5% glycerol, 0.05% NP-40, 5 mM MgCl 2 , and 10 mM EDTA, and 200-fold excess of unlabeled probes was used for competition assays. The protein complexes were resolved on 6% DNA retardation gels for 1 h at 100 V, transferred to Biodyne B Nylon Membranes (PALL, catalog no. 60208), cross-linked, and detected using streptavidin-HRP conjugate and a chemiluminescent substrate. The oligos used are listed in Supplementary Table 3.

Lentiviral Constructs, Lentivirus Production, and Infection
The shRNA constructs targeting E2F1 and MYC were ordered from Merck. The shRNA sequences used are listed in Supplementary Table 4. Second-generation lentiviral vectors were packaged using HEK293T cells. In detail, HEK239T cells were trypsinized and seeded into 3.5-cm plates; 24 h later, the normal medium was replaced with 2 ml of low-glucose DMEM (Invitrogen, catalog no. 21885025) containing 10% FBS and 0.1% penicillin and streptomycin. Cells were co-transfected with indicated shRNA construct (1.5 mg each), pMD2.G (envelope plasmid, 0.375 mg) (Addgene, catalog no. 12259), and psPAX2 (packaging plasmid, 1.125 mg) (Addgene, catalog no. 12260) plasmids using 8 µl of Lipofectamine 2000 (Invitrogen, catalog no.11668019). The medium was changed to fresh medium after 24 h, and the virus-containing medium was harvested every 24 h up to three times. Lentivirus was passed through a 0.45-mm filter unit and stored at −80°C. For viral transduction, LNCaP cells were seeded in six-well plates at a density of 60%-70%. Sixteen to 20 h later, cell culture medium was replaced with lentiviruscontaining medium with final 8 mg/ml polybrene (Merck, catalog no. H9268). For lentivirus-mediated knockdown experiment, virus was removed and replaced by normal medium containing final 1 mg/ml puromycin (Merck, catalog no. P9620) after 24 h. When uninfected control cells were completely killed, the target cells were cultured in normal growth medium with 0.5 mg/ ml puromycin.

Motif Analysis
The effect of rs2853669 on transcription factor binding motifs was analyzed using R package at SNP v1.2.0 (affinity test for regulatory SNP detection) (18) in R (v.3.6.3). Binding affinity tests were performed for the motif matches between MYC and E2F1 with alleles of rs2853669 using the derived motif library ENCODE. R packages "seq Logo" (v. 1.52.0) (19) and "universal motif" (v. 1.4.10) (20) were applied to create and plot motif logos.
Genotype Imputation IMPUTE2 (v. 2.3.2) 64 was used to perform the genotype imputation of rs2853669 from three cohorts, TCGA, Stockholm, and Cambridge, composed of 389, 94, and 119 prostate samples, respectively. QCtool (v.2.0.7) (20) was used to assess and perform quality control by setting parameters "-threshold 0.9, -snp-stats". SNPs that failed to pass the quality checking were excluded from imputation. 1000 Genomes Phase 3 data were selected as reference panel (18). Parameters were set as default "-Ne = 2000 and -k hap = 500". A SNP-centered 2-MB region was set for the imputation on chromosome 5. Genotypes in the Gen format were converted to dosage format for downstream analysis. The transcriptional profiling was assessed by Illumina Expression Bead Chip in Swedish and Cambridge human prostate tissue samples, while RNA-seq was used in TCGA samples. The Stockholm and TCGA cohorts were genotyped on Illumina Omni 2.5 and Affymetrix SNP array 6, respectively.

RNA-Seq and Differential Expression Analysis
Mutated T/T cells and C/C cells were harvested, total RNA was extracted with Trizol Reagent (Roche, catalog no. 11667165001) and ethanol precipitation, and then samples were sent to BGI Group for sequencing. For the RNA sequencing of the T/T and C/C genotyped cells, single-end raw sequence reads were first pre-processed with FastQC (21) to assess the read quality. SOAPnuke (22) was employed to process reads for quality trimming and adapter removal with the following criteria: Reads with adaptors were removed, reads with more than 5% of Ns were filtered out, and low-quality reads with more than 20% of bases with quality score smaller than 10 were ruled out. Cleaned reads were further trimmed to 50 bp. A final FastQC run was performed to ensure the success of previous quality control steps. The processed reads were aligned against the human genome assembly hg38 using STAR version 2.7.2a (23) with default settings. HTSeq (htseq-count) was employed to quantitate aligned sequencing reads against gene annotation from Encode and with parameters "-s no, -i gene_name". Differential expression analysis was performed from read count matrix using Bioconductor package DESeq2 (1.26.0) (24). Genes with low expressions (<2 cumulative read count across samples) were filtered out prior to differential expression analysis. A threshold of FDR < 0.05 was applied to generate the differentially expressed gene list. Data were normalized using variance Stabilizing Transformation (VST) method from DESeq2. A sample-to-sample distance matrix using hierarchical clustering with Euclidian distance metric from normalized total transcriptome of each sample was applied to examine expression correlations among biological replicates. Heatmap displaying differentially expressed genes between C/C and T/T was generated using the R package "pheatmap" (1.0.12). We have deposited the RNA-seq data to European Nucleotide Archive (ENA) with study accession number PRJEB47829.

Gene Set Enrichment Analysis
We applied Gene Set Enrichment Analysis (GSEA) to interpret the biological mechanisms underlying the RNA-Sequencing of T/T and C/C genotyped cells from differential expression results obtained from DESeq2. The pre-ranked gene list was obtained by sorting the "stat" statistics from the differential gene list in a descending order. GSEAPreranked test (25) was applied to examine the enrichment of upregulated genes in C/C allele compared to T/T allele from the MSigDB database including Hallmark gene sets, WikiPathways gene sets, chemical and genetic perturbations gene sets, and Biocarta gene sets.

Generation of the CC Genotype Signature
The CC genotype signature was developed from the differentially expressed genes upon C/C vs. T/T alleles from RNA-seq. A list of 311 genes was initially generated by applying cutoff with FDR ≤ 0.01 and log e ≥ 2 to the 5016-DE gene list. We defined this 311gene list as CC genotype signature and acquired available gene expression data from cBioportal, which further resulted in a 205gene signature. We then calculated the Z score value of this signature by summing up the normalized gene expression data and further examined its correlation with AR signaling, CCP, hypoxia, and EMT scores.

Statistical Analysis
All statistical analyses were performed using RStudio (v. 1.2.5033) (28) with R (v. 3.6.3) (29). Data were obtained from the cBioPortal for Cancer Genomics (30,31), Oncomine database (32), and GEO database (33,34). Differential gene expression analyses were applied among normal prostate, tumor, and metastatic tissues from various independent cohorts. Statistical analyses were performed to study the correlation between gene expression levels and clinical features including lymph nodes. Statistical tests for patients with gene expression defined to two groups were calculated using Mann-Whitney U test, while Kruskal-Wallis H test was applied for cohorts with more than two groups. Kaplan-Meier survival analysis was used to assess the impact of gene expression levels on PCa prognosis and survival. Patients were stratified based on median expression of genes or genotype of rs2853669. For the association between rs2853669 genotype and the prognosis survival, we tested several scenarios considering the synergistic effects of gene expression levels and rs2853669 genotype. The survival analyses were performed and visualized as Kaplan-Meier plots by using the R package "Survival" (v. 3.2.3) (35,36) and "Survminer" (v. 0.4.7) (37). Function "Surv" was first used to generate the survival models with "time-to-event" and "event status" as input from clinical cohorts. Median expression of genes was further followed to fit the models by function "survfit". Statistical analyses for all Kaplan-Meier curves were calculated using log-rank test. Cox proportional hazards model was employed to assess the hazard ratio (HR) (38). To examine the association of expression of TERT, E2F1, and MYC with androgen signaling, we devised a representative AR signaling signature with a panel of 10 genes, including SOX9, RAN, TNK2, EP300, PXN, NCOA2, AR, NRIP1, NCOR1, and NCOR2. The Z score sum of these 10 genes was calculated, and patients were grouped by the median expression of the AR signaling signature. For correlation analysis of gene expressions, we tested the linear correlation among the expression levels of TERT, MYC, and E2F1 in benign prostate and tumor issues in several independent cohorts from cBioPortal and Oncomine databases. Both Pearson and Spearman methods were applied to assess the co-expression correlations between gene expression levels. For microarraybased expression profiling, we selected gene probes with lowest p-values. Samples with missing genotype, expression, or patient survival data were excluded from analyses. p-value < 0.05 was considered to be statistically significant. Statistical tests and figures were generated in R (3.6.3)

Discovering an Association of the TERT Promoter Variant rs2853669 With PCa Susceptibility
To investigate whether TERT expression was affected by SNPs, we performed an association study and fine mapping analysis in several independent populations (see Methods) using the SNPs resided in TERT or far from TERT 5′ promoter region. We found several SNPs associated with PCa risk; however, among these SNPs, only one SNP named rs2853669 (located at 245 bp to ATG site of TERT) was residing in the functional region. Fine mapping showed strong linkage disequilibrium (LD) between rs2853669 and SNPs at 5'-UTR, but not the coding region of TERT ( Figures 1A, B). Interestingly, our results indicated an opposite association of C or T allele of rs2853669 with PCa in different populations. The C allele of rs2853669 was significantly associated with PCa susceptibility (odds ratio, OR = 1.10 (95% CI 1.07-1.13), p = 4.76 × 10 -10 , Table 1) in Caucasians from UK Biobank (10,207 cases and 199,534 controls), and the T allele of rs2853669 was considered as a risk allele [OR = 1.29 (95% CI 1.01-1.64), p = 0.038, Table 1] in the Chinese population (1,100 cases and 1,410 controls). In addition, we observed the similar OR value of T allele at rs2853669 in another Chinese population cohort (39) even though the p-value is not significant [OR = 1.28 (95% CI 0.89-1.83), p = 0.18, Table 1]. Thus, the regulatory mechanism underlying rs2853669 needs to be deeply investigated to explain these contradictory results.

rs2853669 Altered Allele-Specific Chromatin Binding of E2F1 and MYC
To shed light on the mechanisms underlying rs2853669 in PCa, we first sought to find whether there might be oncogenic transcription factors binding at this SNP-containing region. We thus observed ChIP-seq enrichment signals of MYC and E2F1 at the rs2853669-containing region, and this observation was further verified in LNCaP cells (Figures 1C, D). We next performed SNPs and transcription factor DNA-binding motif matching analysis (40) and examined whether variation at rs2853669 directly modulates transcription factor binding. This analysis revealed that rs2853669 maps within the binding motifs of both E2F1 and MYC ( Figure 1E). Interestingly, E2F1 had a higher preference for the C allele whereas MYC favored the T allele of rs2853669. To confirm this allele-specific DNA binding, we performed electrophoretic mobility shift assays (EMSAs) with HEK293T cell nuclear extract containing ectopically expressed E2F1 or MYC, respectively. Consistent with the motif analysis results, we observed the DNA binding preference of E2F1 to the C or A allele and MYC to the rs2853669 G or T allele ( Figures 1F, G). In line with in silico motif analysis and in vitro EMSA results, our ChIP-qPCR followed by Sanger sequencing results further proved that MYC or E2F1 preferentially bound to different alleles of rs2853669 in vivo in LNCaP cells ( Figure 1H). Altogether, we demonstrated that both MYC and E2F1 can bind at the TERT promoter region, and MYC prefers binding to T allele while E2F1 favors the C allele of rs2853669.
TERT Was a Potential Target Gene of E2F1 and MYC To assess whether MYC or E2F1 affects the expression of TERT, we performed both ectopic overexpression and short hairpin RNA mediated knockdown of MYC or E2F1 assays in PCa cells. The results showed that the protein level of TERT was upregulated upon overexpression of MYC or E2F1 and downregulated upon knockdown of MYC or E2F1 ( Figure 1I and Supplementary Figures S1A-D). Furthermore, we observed significant positive correlations between the mRNA expression levels of TERT and MYC or E2F1 respectively in multiple large cohorts of clinical prostate tissue samples ( Figures 1J-O and Supplementary Figures S1E-J), suggesting that MYC and E2F1 regulate the expression of TERT in the clinical settings. Interestingly, our genome-wide co-expression analysis revealed that TERT is the target gene in both TGCA and CPGEA cohorts (Supplementary Figures S1K, L), whereas TERT is a topranking target gene positively correlated with E2F1 in one cohort (41) but not with MYC, while in the other cohort (42), TERT and MYC were highly positively correlated, and no significant expression correlation was observed between TERT and E2F1 (Supplementary Figures S1M, N). These results further indicated that MYC or E2F1 might regulate TERT expression through different regulatory mechanisms.

The Role of E2F1 in Regulating MYC Disturbed by AR Signaling Pathway
Given that there are close regulatory associations between AR signaling and MYC or E2F1 in prostate tumors (43)(44)(45), we investigated the correlation between AR signaling intensity and the expression of MYC, E2F1, or TERT. We observed higher MYC and lower E2F1 expression levels in the PCa patient group with higher AR signaling activity compared to that of lower AR signaling group (Figures 2A, B, D, E and Supplementary  Figures S2A-D). No clear expression correlation between TERT and AR signaling was observed despite the fact that TERT was regulated by MYC and E2F1 ( Figures 2C, F and Supplementary Figures S2E, F). Moreover, E2F1 expressions were elevated in metastasis prostate tumors in several independent cohorts of PCa ( Figure 2G and Supplementary  Figures S2G-I). TERT expressions showed an elevated trend in metastasis prostate tumors in several independent cohorts of PCa even though the significance was not strong in Taylor and Yu datasets ( Figure 2H and Supplementary Figures S2J-L). In contrast, the expressions of MYC in metastasis prostate tumors in different cohorts were inconsistent ( Figure 2I and Supplementary Figures S2M-O). These inconsistent results might be attributable to the differences in AR signaling pathway in the metastasis prostate tumors. Therefore, the association of MYC, E2F1, and TERT with androgen-dependent PCa progression to androgen-independent stage needs to be deeply investigated.
Thus, we next cultured LNCaP cells with hormone-deprived medium for 18 days and analyzed the expression of MYC, E2F1, and TERT at five time points ( Figure 2J). Notably, the MYC level   Figures S1C, D). In addition, it was reported that E2F1 physically interacts with AR (46), suggesting that the regulatory association between E2F1 and MYC might correlate with AR signaling pathway. We thus treated PCa cells with or without dihydrotestosterone (DHT) and performed ChIP-qPCR experiment. The results demonstrated that E2F1 has an apparent stronger binding at MYC promoter region in cells with DHT treatment than in control cells ( Figure 2S), indicating an androgen signaling-dependent manner. Taken together, in androgen-responsive cells, E2F1 might cooperate with AR signaling pathway to promote TERT expression through upregulating the expression level of MYC when cells were under higher androgen stimulation. However, in an androgen deprivation environment, E2F1 could directly upregulate TERT expression without the involvement of MYC to assist androgendependent cells' survival in adverse environments.

AR Signaling Pathway Coordinated rs2853669-Mediated Regulation of TERT Expression Through E2F1 and MYC
These findings together lead us to an important assumption whether the regulatory mechanisms underlying MYC and E2F1 at the rs2853669 region were coordinated with the status of androgen signaling. To test this hypothesis, we performed luciferase-based promoter assays and observed that androgen stimulus obviously increased the promoter activity of the region harboring T allele but decreased the promoter activity of the region with the C allele in LNCaP and 22Rv1 cells, respectively. Conversely, the region harboring the C allele was observed with an increased promoter activity compared to the region with T allele under androgen deprivation conditions ( Figure 3A and Supplementary Figure S3A). We then tested the promoter activity in RWPE1 cells; T allele increased the promoter activity when undergoing DHT treatment, but both the plasmid with T and C allele had no obvious luciferase activity in normal prostate epithelial cells (Supplementary Figure S4B). We further performed ChIP followed by quantitative PCR and Sanger sequencing in androgen-sensitive PCa LNCaP cells.
Androgen stimulation significantly promoted MYC binding at the fragment harboring rs2853669, whereas androgen deprivation obviously increased the enrichment of E2F1 at this region ( Figures 3B, C). Different from cancer cells, we did not observe the enrichment of either E2F1 or MYC at this region in normal RWPE1 cells, whereas MYC showed obvious enrichment at this SNP containing region when the cells were treated with DHT (Supplementary Figures S4C, D). Moreover, MYC was preferentially recruited to T allele at rs2853669 under DHT treatment, whereas E2F1 showed stronger binding affinity with C allele than T allele after androgen removal ( Figure 3D). Consistently, we observed that C allele at rs2853669 is associated with higher TERT expression in PCa patients with higher expression levels of E2F1, and TERT indicated a trend to be upregulated in patient group carrying T allele at rs2853669 with higher MYC expression levels ( Figures 3E, F). There was no correlation between TERT expression and those two alleles at rs2853669 in patient group with lower E2F1 or MYC expression levels (Supplementary Figures S3E, F). To further provide evidence to support our results, we overexpressed E2F1 or MYC in T/T and C/C mutated cells; ectopic overexpression of E2F1 in C/C cells could significantly enhance TERT expression level compared with that in T/T cells, while ectopic overexpression of MYC in T/T cells could significantly enhance TERT expression level compared with that in C/C cells (Supplementary Figures S4F, G). Taken together, these results suggested that T allele could promote TERT expression through increasing the binding of MYC to TERT promoter when PCa cells were androgen-stimulated. In contrast, TERT expression level was maintained through cooperating the C allele of rs2853669 with E2F1 under androgen-deficient environment in PCa cells.

Direct Effects of rs2853669 on EMT and CRPC
To further assess the phenotypic impacts of rs2853669 alteration, we first performed single-nucleotide mutation using CRISPR/ Cas9-mediated genome editing approach and successfully converted the genotype of rs2853669 from T/C to T/T or C/C in the PCa cell line 22Rv1 ( Figure 4A). We applied ChIP-qPCR to examine the occupancy status of MYC or E2F1 at the rs2853669containing region in our mutated cells. We found that MYC chromatin occupancy at the rs2853669 locus was higher in the TT clones than in the TC and CC clones, while E2F1 chromatin occupancy at the rs2853669 locus was higher in the CC clone than the TC and TT clones ( Figure 4B).
Although there were no obvious morphology differences among rs2853669 T/T, parental T/C, and C/C 22Rv1 cell lines (Supplementary Figure S4A), cadherin switching, a major hallmark of EMT (36), was observed among those clones. Downregulation of E-cadherin and upregulation of N-cadherin were confirmed by immunofluorescence and Western blot studies, indicating that C allele of rs2853669 played a pivotal role in EMT (Figures 4C, D). To explore whether androgen had influence on cellular proliferation and migration of the 22Rv1 cells with different genotypes of rs2853669, we treated cells with or without DHT and performed cell proliferation, wound healing, and invasion assays. T/T cells grow faster than T/C and C/C cells under DHT treatment ( Figure 4E), whereas C/C cells grow faster than the other two genotyped cell clones after removing androgen ( Figure 4F). Consistently, we found an obvious inhibition of wound closure and invasion ability in both T/T and T/C cells after removing androgen, while wound closure and invasion ability in C/C cells was not affected by androgen deficiency (Figures 4G, H and Supplementary  Figures S4B-E). However, we did not observe obvious difference between the wound closure and invasion ability among T/T, T/C, and C/C cells when treated cells with DHT.
To further explore the underlying biological mechanisms that were affected by the different alleles at rs2853669, we next performed RNA sequencing (RNA-seq) analysis of T/T and C/C genotyped cells. Two biological replicates were performed in each group and high correlations between replicates were observed (Supplementary Figure S5A). The differential gene expression analysis identified 2,644 and 2,462 significantly upregulated and downregulated genes, respectively (FDR < 0.05; Figure 5A). Gene Set Enrichment Analysis (GSEA), performed in various gene sets from MSigDB, identified multiple pathways relevant to cell cycle and cancer development significantly enriched in upregulated genes of C/C cells compared to control cells with T/T alleles. Several cell growth-related genes in androgen response, hypoxia, Met, and EGF pathways were found upregulated in C/C cells (Supplementary Table S5, Figures 5B, C, and Supplementary  Figures S5B, C). To explore the role of targeted genes affected by C/C alleles at rs2853669, we developed a CC genotype signature score and examined its relevance in various independent PCa cohorts. We found significant positive linear correlation between CC genotype signature and AR signaling ( Figures 5D-F and Supplementary Figures S5D-F), cell cycle progression (CCP) (Figures 5G-I), and hypoxia scores ( Figures 5J, K and Supplementary Figures S5G, H), which was in line with the results showing that C/C cells possessed more proliferative capability ( Figure 4F). Moreover, in concordance with the above experimental results (Figures 4G, H), we found that  genes involved in EMT, metastasis, and TGF-beta pathways were also upregulated in C/C cells ( Figure 5L and Supplementary  Figures S5I, J). The correlation analysis also revealed remarkable positive association between CC genotype signature score and EMT score in multiple PCa cohorts ( Figures 5M-O and Supplementary Figures S5K-M), further indicating a higher invasiveness capacity of C/C cells compared to T/T cells. Collectively, our data validated that CC alleles maintain cell survival in hormone deficiency environment and play a crucial role in EMT progression. elevated risks of biochemical relapse, and metastasis. In contrast, the PCa patient group with simultaneous triple low expression appeared with better prognosis (Figures 6A-D and Supplementary Figures S6A, B). Notably, this observation was not found in normal prostates of the CPGEA cohort (Supplementary Figure S6C). To further investigate the synergistic effects of the three genes in clinical settings, we examined its correlation with several clinical features including biochemical recurrence, tumor stage, metastasis, PSA, and patient neoplasm status in PCa cases. The results revealed that percentages of patients with biochemical relapse were significantly higher in groups with triple high expression of the three oncogenes ( Figure 6E and Supplementary Figure S6D). The percentages of tumor stage III and IV were 62.2% and 79.4% in the low or high expression groups, respectively (p = 0.023) ( Figure 6F). Patients carrying tumors were also substantially more in the triple high group (Supplementary Figures S6E, F).
Proportions of patients with metastasis or PSA higher than 20 ng/ml were notably elevated in triple high group ( Figures 6G, H and Supplementary Figure S6G). Moreover, the percentage of patients with advanced Gleason score was significantly higher in the triple high group in multiple cohorts ( Figures 6I-K). These findings suggested that synergistic triple high expression of E2F1, MYC, and TERT were greatly associated with PCa severity and poorer prognosis. We next performed the multivariate Cox regression model to investigate the incorporated gene signature of TERT, E2F1, and MYC together with clinical prognostic factors influencing survival in PCa patients. The results revealed that the performance of the risk of the triple overexpression of the gene signature had the highest hazard ratio (Figure 6L), indicating a superior prognostic value in PCa compared to other clinicopathological features. Taken together, these findings indicated that the synergistic co-overexpression of TERT, E2F1, and MYC in PCa is strongly associated with poor prognosis, tumor progression, metastasis, and patient survival.

Allele-Specific Impact of rs2853669 on PCa Survival
We next asked whether the genotype of rs2853669 impacts PCa prognosis and thus examined the correlation of the CC genotype at rs2853669 with clinical features in PCa patients. This analysis showed that patients carrying genotype CC or TC at rs2853669 have a shorter time for biochemical relapse than patients with the TT genotype in the Stockholm cohort ( Figure 7A). We also observed a clear trend in the CPGEA Nature 2020 dataset showing C allele associating with poor prognosis (Supplementary Figure S7A). Considering that E2F1 and MYC are essential factors driving the severity of various tumors (50, 51), we thus explored whether the correlation of rs2853669 genotype and clinical variables was affected by MYC or E2F1 expression status. We observed that the CC genotype at rs2853669 Regulates TERT via Hormone rs2853669 was correlated with increased risk of biochemical relapse and overall survival rate in the PCa patient group with higher E2F1 expression ( Figures 7B, C). We additionally tested the association between the SNP and patient survival with the consideration of E2F1 expression levels in Stockholm and CPGEA Nature 2020 datasets shown in Supplementary  Figures S7B, C. The results from both datasets showed that C allele is associated with poor prognosis. We observed a similar trend between the TT genotype at rs2853669, and the overall rate was observed in the patients with high expression of MYC ( Figure 7D). These results further proved that E2F1 and MYC drive the progression of PCa through different alleles at rs2853669.

DISCUSSION
Over the past 15 years, GWASs have successfully identified many pleiotropic loci associated with PCa severity, despite the fact that a certain number of association results are inconsistent across different study cohorts. In this work, we also observed the opposite associations between rs2853669 and PCa in different populations. The C allele of rs2853669 was significantly associated with PCa severity observed in Caucasians from UK Biobank whereas T allele was found as a PCa risk allele in Chinese population. We mechanistically determined that MYC and E2F1 regulate TERT expression in PCa via a regulatory element that is disrupted by the T allele or C allele at rs2853669, respectively. The switch with T or C allele playing roles in regulating TERT expression was determined by AR signaling though other unknown factors may not be ruled out. These findings suggested that either T or C allele at rs2853669 can be the risk allele when cells are under certain molecular conditions. Disease severity, hormone levels, and biological contexts among the clinical samples might cause different observations regarding the association between rs2853669 and PCa risk. Importantly, while upregulation of MYC by E2F1 was reported in other cancer types (52,53), our findings deeply uncovered a critical role for E2F1 in regulating MYC expression in PCa providing a new perspective for exploring the crosstalk between E2F1 and MYC that function as two regulators in G1/S transition and tumor cell growth. Moreover, our study uncovered the critical role of C allele at rs2853669 in CRPC progression and EMT, dispensing new clues to unveil the molecular mechanisms underlying PCa development and progression. Both T and C alleles were essential for cell growth at the primary stage in PCa. However, C allele at rs2853669 showed higher risk association than that of T allele, implying that it plays a vital role in maintaining cell growth and enhancing invasion ability in hormone-free environment by recruiting E2F1. Future studies are required to discover whether there might be other factors affecting the regulatory mechanism of MYC and E2F1 at this SNP considering the biological context of individuals are unique and complicated. The androgen response genes were upregulated in C/C cells, thereby explaining why C/C cells had hormone castration resistant capacity in view of the fact that AR plays an important role in CRPC. In addition to androgen response genes, many growth and metastasis-associated genes were also upregulated in C/C cells, implying that the fragment harboring rs2853669 might be an enhancer element for other genes; alternatively, this SNP could change the three-dimensional structure of genome and may warrant further investigation.
Here, we proposed a novel model for studying germline variants in PCa; this model might be popular in other sex hormone relevant cancer types, such as breast cancer. For example, the roles of rs2853669 in breast cancer are still inclusive; one study reported an increased breast cancer risk in patients carrying the CC genotype (54), whereas other studies found that the CC genotype is not related to breast cancer risk (55). Our findings might provide more clues for unveiling the regulatory mechanisms underlying rs2853669 in breast cancer, and these inclusive results may be due to altered estrogen receptor signaling pathway, thereby facilitating us to fully understand the impact of the genetic variants on human diseases.
We reported here for the first time that the hormone level is one of the important factors in regulating the function of the variants. These results also proposed that there could be complicated crosstalk between one SNP and certain important signaling pathways in PCa. Complex factors such as ethnic groups, biological characters, genetic factors, and living surroundings between individuals brought more difficulties to explain the real biological functions of these loci. In future studies, detailed classification of tumor samples is essential because there might be more risk SNPs functioning like rs2853669 undiscovered. Moreover, more efficient genome editing tools need to be explored to convert the genotypes of more SNPs at one time to speed up the research of the synergies between two or more SNPs at the cellular level. Furthermore, the biological differences between individuals should also be considered when we implicate association studies to improve the detection, prognosis, and risk evaluation in PCa.
In summary, we interpreted the functional mechanisms of a 5p15 locus SNP rs2853669 in regulating TERT expression and PCa development, which may provide potential clues for improving PCa risk prediction and prognosis.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
PG, RN, and G-HW supervised the project. XD, JH, QX, BX, PZ, and HL performed the experiments with the help from QH. RN performed cohort studies. TY provided advice on bioinformatics. QZ performed bioinformatics analysis. PG, RN, G-HW, XD, and QZ designed the studies and wrote the manuscript. All authors contributed to the article and approved the submitted version.