Association of Genetic Variants Affecting microRNAs and Pancreatic Cancer Risk

Genetic factors play an important role in the susceptibility to pancreatic cancer (PC). However, established loci explain a small proportion of genetic heritability for PC; therefore, more progress is needed to find the missing ones. We aimed at identifying single nucleotide polymorphisms (SNPs) affecting PC risk through effects on micro-RNA (miRNA) function. We searched in silico the genome for SNPs in miRNA seed sequences or 3 prime untranslated regions (3'UTRs) of miRNA target genes. Genome-wide association data of PC cases and controls from the Pancreatic Cancer Cohort (PanScan) Consortium and the Pancreatic Cancer Case–Control (PanC4) Consortium were re-analyzed for discovery, and genotyping data from two additional consortia (PanGenEU and PANDoRA) were used for replication, for a total of 14,062 cases and 11,261 controls. None of the SNPs reached genome-wide significance in the meta-analysis, but for three of them the associations were in the same direction in all the study populations and showed lower value of p in the meta-analyses than in the discovery phase. Specifically, rs7985480 was consistently associated with PC risk (OR = 1.12, 95% CI 1.07–1.17, p = 3.03 × 10−6 in the meta-analysis). This SNP is in linkage disequilibrium (LD) with rs2274048, which modulates binding of various miRNAs to the 3'UTR of UCHL3, a gene involved in PC progression. In conclusion, our results expand the knowledge of the genetic PC risk through miRNA-related SNPs and show the usefulness of functional prioritization to identify genetic polymorphisms associated with PC risk.

Frontiers in Genetics | www.frontiersin.org 2 August 2021 | Volume 12 | Article 693933 BACKGROUND Pancreatic cancer (PC) incidence is rising, and it is still one of the most fatal cancers due to the absence of early symptoms, the lack of early detection methods, and limited effect of surgical resection even when in the context of multimodal treatments. Several epidemiologic PC risk factors have been identified, including cigarette smoking, heavy alcohol intake, type two diabetes mellitus, high BMI, and chronic pancreatitis (Maisonneuve and Lowenfels, 2015; Barone et al., 2016;Principe and Rana, 2020). The genetic susceptibility to PC is explained by rare high penetrance mutations identified through sequencing approaches (Hu et al., 2018) and common low penetrance variants discovered through genome-wide association studies (GWAS; Amundadottir et al., 2009;Low et al., 2010;Petersen et al., 2010;Wu et al., 2012;Wolpin et al., 2014;Childs et al., 2015;Zhang et al., 2016;Klein et al., 2018;Campa et al., 2020;Gentiluomo et al., 2020; Genetic factors play an important role in the susceptibility to pancreatic cancer (PC). However, established loci explain a small proportion of genetic heritability for PC; therefore, more progress is needed to find the missing ones. We aimed at identifying single nucleotide polymorphisms (SNPs) affecting PC risk through effects on micro-RNA (miRNA) function. We searched in silico the genome for SNPs in miRNA seed sequences or 3 prime untranslated regions (3'UTRs) of miRNA target genes. Genome-wide association data of PC cases and controls from the Pancreatic Cancer Cohort (PanScan) Consortium and the Pancreatic Cancer Case-Control (PanC4) Consortium were re-analyzed for discovery, and genotyping data from two additional consortia (PanGenEU and PANDoRA) were used for replication, for a total of 14,062 cases and 11,261 controls. None of the SNPs reached genome-wide significance in the meta-analysis, but for three of them the associations were in the same direction in all the study populations and showed lower value of p in the meta-analyses than in the discovery phase. Specifically, rs7985480 was consistently associated with PC risk (OR = 1.12, 95% CI 1.07-1.17, p = 3.03 × 10 −6 in the meta-analysis). This SNP is in linkage disequilibrium (LD) with rs2274048, which modulates binding of various miRNAs to the 3'UTR of UCHL3, a gene involved in PC progression. In conclusion, our results expand the knowledge of the genetic PC risk through miRNA-related SNPs and show the usefulness of functional prioritization to identify genetic polymorphisms associated with PC risk.  Feng et al., 2019;Gentiluomo et al., 2019a,b;Xu et al., 2019;Yang et al., 2019;Corradi et al., 2021). Only a small number of susceptibility loci for PC risk have been found thus far. Besides, previously established risk loci explain only a small proportion of genetic heritability for PC, and it is estimated that there may be over 1,700 susceptibility variants independently associated with PC risk (Zhang et al., 2020). Most disease-associated risk loci are located in non-coding regions of the genome, raising the possibility that the variants might influence gene expression through effects on transcription initiation, splicing or mRNA stability (Gallagher and Chen-Plotkin, 2018). Germline genetic variants, such as single nucleotide polymorphisms (SNPs) in genes encoding for micro-RNAs (miRNAs), or in 3 prime untranslated regions (3'UTRs) of the corresponding binding sites can affect miRNA transcription and the mRNA-miRNA interaction, with the consequent alteration of gene expression (Landi et al., 2012;Iuliano et al., 2013;Ryan, 2017). miRNAs have been shown to participate in the development of PC, by modulating multiple cellular processes (Yonemori et al., 2017;Rawat et al., 2019). miR-217, miR-96, and miR-126 have been shown to regulate KRAS, which is the signature mutation gene in pancreatic carcinogenesis Zhao et al., 2010;Jiao et al., 2012). Several studies have identified that miRNA-related SNPs could be associated with a range of diseases, including PC, lung cancer, colorectal cancer, gastric cancer, multiple myeloma, breast cancer, and attention-deficit/ hyperactivity disorder (Kupcinskas et al., 2014;Ryan et al., 2015;Zheng et al., 2016;Macauda et al., 2017;Petkevicius et al., 2017;Mosallaei et al., 2019;Abdi et al., 2020;Bahreini et al., 2020). In this study, we aimed at identifying SNPs in the mature miRNA genes and target sites, involved in PC development.

Study Populations
We utilized a two-phase (discovery and replication) approach. For the discovery phase, the study populations used were obtained from the Pancreatic Cancer Cohort Consortium (PanScan I-III) and the Pancreatic Cancer Case Control Consortium (PanC4), which provide the largest publicly available pancreatic cancer GWAS data of European ancestry. The data were downloaded from the NCBI database of genotypes and phenotypes (dbGaP; study accession numbers phs000206.v5.p3 and phs000648.v1.p1; project reference #12644). Each participating study (within PanScan I-III and PanC4) obtained informed consent from study participants, and approval from the responsible institutional review board (IRB), as described in the original papers (Amundadottir et al., 2009;Petersen et al., 2010;Wolpin et al., 2014;Childs et al., 2015;Zhang et al., 2016;Klein et al., 2018). Genotyping for PanScan was performed using the Illumina HumanHap550, Human 610-Quad, and OmniExpress arrays, for PanScan I, II, and III, respectively. Genotyping for the PanC4 GWAS was performed using the Illumina HumanOmniExpressExome-8v1 array. We conducted standard quality control of the genotype data and performed imputation using the Michigan Imputation Server 1  and the Haplotype Reference Consortium (HRC, V.r1.1) reference panel (McCarthy et al., 2016) for the datasets separately. SNPs with low imputation quality [INFO score r 2 < 0.7, minor allele frequency (MAF) <0.005 or call rate <0.9] were excluded after imputation. We subsequently merged the four imputed datasets and obtained a pooled dataset containing 8,769 cases and 7,055 controls for further analysis of 7,543,430 SNPs.
Summary statistics of the European Study of Digestive Diseases and Genetics (PanGenEU) were used for replication. PanGenEU has been described in detail elsewhere (Gomez-Rubio et al., 2017;Molina-Montes et al., 2020). Briefly, it is a case-control study conducted in Spain, Italy, Sweden, Germany, United Kingdom, and Ireland, between 2009 and 2014. IRB approval and written informed consent were obtained from all participating centers and study participants, respectively. DNA samples were genotyped using the Infinium OncoArray-500 K, and genotype imputation was performed using IMPUTE2 with 1,000 Genomes-phase 3 as reference panel (López de Maturana et al., 2021).
Samples from the PANDoRA consortium, mostly from European populations, were selected for genotyping as a replication set as well. Cases diagnosed with PC [mostly pancreatic ductal adenocarcinomas (PDAC), and 137 other exocrine pancreatic cancers] were collected from the PANDoRA consortium, which has been described previously (Campa et al., 2013b). Controls were collected in the same geographical regions as the cases, mostly in the context of the PANDoRA consortium. We also included additional German controls from ESTHER, and British and Dutch controls from the European Prospective Investigation on Cancer (EPIC), 2 which already have available GWAS data. ESTHER is a prospective cohort with 9,940 participants recruited during a general health check-up between July 2000 and December 2002 in the Saarland region of Germany. EPIC is an ongoing prospective cohort study that recruited healthy volunteers from the general population in 10 European countries (Riboli et al., 2002). EPIC samples were genotyped in the context of a GWAS using the Illumina Human 660 W-Quad BeadChip array. All subjects provided written informed consent and the ethical approval for the PANDoRA study protocol (including for controls from ESTHER and EPIC cohorts) was received from the Ethics Commission of the Medical Faculty of Heidelberg University.
The description of the study populations is shown in Table 1.

SNP Selection
We conducted a separate selection procedure for SNPs located in the miRNA seed regions (6-8 nucleotides at the 5' end of the mature miRNA) and SNPs located in the 3'UTRs of genes that are targeted by miRNAs. The flow chart of candidate SNP selection is shown in Figure 1.
The selection of SNPs located in mature miRNAs was processed using the database miRBase 3 (Kozomara et al., 2019) Frontiers in Genetics | www.frontiersin.org and the web-tool SNPnexus (Dayem Ullah et al., 2013). We obtained a list of 1,917 mature miRNAs from miRBase (release 22.1). Then, SNPnexus 4 was used to identify SNPs present in the seed region of those miRNAs by inserting the start and end position on the chromosome of each miRNA. The SNPs with MAF < 0.01 were excluded and 344 SNPs remained, among which 256 could be extracted and investigated in the PanScan and PanC4 populations. Around 10 SNPs were observed to be associated with pancreatic cancer risk with 4 https://snp-nexus.org p < 0.05 (ranging from 2.74 × 10 −3 to 0.049), and the top two most promising SNPs were forwarded to undergo a metaanalysis with the PanGenEU samples.
The selection of SNPs located in the 3'UTRs of miRNA target genes was conducted as follows: firstly, we analyzed the association of all 7.5 million SNPs in the PanScan and PanC4 pooled dataset with PC risk by logistic regression using an additive inheritance model adjusting for age, sex, and the top 10 principal components. Next, we extracted from the PanScan and PanC4 pooled dataset 2,575 SNPs that showed an association at p < 10 −4 . Among them, there were 32 SNPs predicted by HaploReg 5 to be located in the 3'UTR regions, and 46 SNPs in linkage disequilibrium (LD, r 2 > 0.6) with SNPs predicted by HaploReg to be located in the 3'UTR regions (Ward and Kellis, 2016). We performed further selection by removing SNPs located in known PC risk loci (N = 24 in six loci), SNPs that showed p ≥ 0.05 for association with PC risk in either PanScan or PanC4 (N = 18 in nine loci) and SNPs showing p > 5 × 10 −5 (arbitrary threshold) for association with PC risk in the combined PanScan + PanC4 dataset (N = 8 in eight loci). Twenty-eight SNPs in 10 loci were further meta-analyzed with PanGenEU samples to select promising SNPs to be genotyped in PANDoRA samples.
Genotyping DNA of PANDoRA samples was isolated from whole blood using QIAamp DNA extraction kit (Qiagen) and distributed in 384-well plates for genotyping. For quality control, 8% of the samples were randomly replicated throughout the plates and no-template controls were used in each plate. Genotyping was performed using TaqMan (ABI, Applied Biosystems, Foster City, CA, United States) probes on the PCR system. Viia7 instrument and Viia7 software (Applied Biosystems, Foster City, CA, United States) were used to detect the genotypes. After calling all the genotypes, we performed several QC steps. Samples with more than one missing genotype were removed. Duplicated samples with more than one discordant genotype were excluded as well. Deviation from Hardy-Weinberg equilibrium (HWE) distribution was checked, in controls, in the overall population and by country. After QC, 3,976 cases and 3,506 controls collected from PANDoRA were included for further analysis, and all the genotyped SNPs were in HWE (p > 10 −3 ).

Statistical Analysis
To investigate the association of the genotyped SNPs in PANDoRA, we performed unconditional logistic regression adjusting for sex, age, and country of origin. We then performed meta-analyses using the fixed-effects model (or random-effects model when p < 0.05 in the heterogeneity test) between the discovery phase (PanScan and PanC4) and the replication phase (PanGenEU and PANDoRA). For the analysis with the genotyped SNPs in PANDoRA, age, sex, and genotypes had missing rates between 1 and 5%. Considering that missing data can have a significant effect on the results, we applied multiple imputation, which is a missing data method that provides valid statistical inferences under the missing at random (MAR) condition (Graham, 2009), with the R package "mice, " which imputes incomplete multivariate data by chained equations (van Buuren and Groothuis-Oudshoorn, 2011). Meta-analyses were performed after multiple imputations as well. Analyses were carried out with R V3.6.

Bioinformatic Tools
We used the following tools/databases to explore the possible function of candidate SNPs: the Genotype-Tissue Expression 5 https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php (GTEx) project portal 6 (accessed on 30 June 2020), to identify the possible effect of the SNPs on gene expression as expression quantitative trait loci (eQTL) or splicing quantitative trait loci (sQTL), HaploReg v4.1 (see Footnote 5), and RegulomeDB 7 to test the regulatory potential (Boyle et al., 2012;Lonsdale et al., 2013;Ward and Kellis, 2016). The predicted effects of the SNPs (known loci and novel loci proposed in this study) on binding of miRNAs to their targets were obtained from the miRNASNP 3.0 8 (Gong et al., 2015) and MirSNP 9 (Liu et al., 2012) databases. We checked differential gene expression in cancer and normal tissue using the Gene Expression Profiling Interactive Analysis (GEPIA) database 10 (Tang et al., 2017).

RESULTS
Meta-analysis for two SNPs in the miRNA seed regions and 28 SNPs selected by the miRNA target approach was performed with 1,317 PC cases and 700 controls from the PanGenEU study. The two miRNA seed SNPs (mir-182-rs76481776 and mir-4772-rs62154973) did not show a sufficiently significant association with PC risk (p = 5.02 × 10 −3 and p = 4.64 × 10 −2 , respectively), and were not prioritized for replication in PANDoRA.
On the contrary, four of the 28 SNPs selected by the miRNA target approach, which showed at least 5-fold lower value of p after adding PanGenEU were carried forward to the PANDoRA replication phase. The associations between these four SNPs and the risk of PC are shown in Figure 2 and Supplementary Table 1. rs7985480 was statistically significant (p < 0.05) in both PANDoRA and PanGenEU populations, and rs13246412 was observed to be associated with PC risk within the PanGenEU population. However, we did not observe genome-wide significant associations between the four SNPs and PC risk in the overall meta-analyses including all cases and controls. rs13246412 showed high heterogeneity in meta-analyses. The other three SNPs exhibited lower value of p in the meta-analysis in comparison with those observed in the discovery phase. Eighty-seven putatively affected miRNAs were identified using the miRNASNP and mirSNP databases for four known PC risk loci (8q21.13, 13q12.1, 16q23.1, and 22q12.1) and four novel loci ( Table 2). Each SNP has a different, independent effect on each different miRNA. Within the four genotyped SNPs, only rs13246412 is located in a 3'UTR. Thus, we predicted the miRNA binding effects for the other three genotyped SNPs using their linked SNPs located in the 3'UTR regions. rs1063192 (in LD with rs4977756, r 2 = 0.78, D' = 0.95) is predicted to have different effects across multiple miRNAs. Seven different miRNAs are predicted to bind UCHL3 at the location of rs2274048 (in LD with rs7985480, r 2 = 0.63, D' = 1). When the allele changes from G to T, the miRNA-mRNA binding is predicted to lose strength for all of them. The T allele of Frontiers in Genetics | www.frontiersin.org 6 August 2021 | Volume 12 | Article 693933 rs4677, located in the 3'UTR of the gene BTBD6, is predicted to create a stronger binding site for miR-3622b-5p (rs2975216 is in strong LD with rs4677, r 2 = 0.90, D' = 1). Additionally, eQTL analysis suggests that the C allele of rs2975216 was associated with higher expression of BTBD6 (p = 1.9 × 10 −7 ) and of BRF1 (p = 1.5 × 10 −5 ) in pancreas. rs7985480-T was associated with increased expression of LMO7-AS1 in adipose-subcutaneous tissue (p = 1.7 × 10 −14 ). HaploReg and RegulomeDB did not show evidence for functional effect for these two variants. No significant eQTLs (p < 0.05) were seen for the SNPs at 7q32.3 (rs13246412) and 9p21.3 (rs4977756).
In this study, we evaluated the associations between miRNArelated genetic variants (in miRNA genes or target binding sites) and PC risk. A possible limitation of the agnostic GWAS approach is that the typically applied genome-wide significance threshold (p < 5 × 10 −8 ) might be overly conservative, which may lead to missing true disease risk loci. Our strategy in this study is to use functional annotation to prioritize SNPs, thereby using a low significance threshold for inclusion (p < 10 −4 ), thus making it easier to potentially reach the genome-wide significance threshold with replication in additional samples. We surveyed the whole set of miRNA-related SNPs in the genome and selected four for full-scale analysis using a combined sample size of 14,062 pancreatic cancer cases and 11,261 controls. None of these SNPs reached genome-wide significance in the overall meta-analysis; however, we observed that the association between rs4977756 (9p21.3), rs7985480 (13q22.2), and rs2975216 (14q32.33) and PC risk was consistent in both the discovery and replication phase, with lower value of p in the overall meta-analyses compared with the discovery phase. In particular, rs7985480, in LD with rs2274048, located in the 3'UTR of UCHL3, was associated with pancreatic cancer risk with at least p < 0.05 in all datasets.
The most notable eQTL in this analysis was observed for rs2975216 at 14q32.33, where the risk-increasing C allele was associated with higher BRF1 and BTBD6 expressions in histologically unaffected pancreatic tissue samples. BRF1 encodes one of the three subunits of the RNA polymerase III transcription factor complex. BTBD6 is a protein coding gene, and it is involved in the innate immune system and class I MHC-mediated antigen processing and presentation pathways (as defined by GeneCards). rs297526 is not located in a 3'UTR region, but is in high LD with rs4677 (r 2 = 0.90), located within the 3'UTR region of BTBD6. rs4677-T correlated with rs2975216-T, that in our data is associated with lower PC risk and is predicted to create a stronger binding site for miR-3622b-5p, which may repress the expression of BTBD6, consistent with the eQTL analysis of rs2975216. A recent transcriptome-wide association study for PC also identified BTBD6 as a possible risk locus (Rawat et al., 2019;Zhong et al., 2020). Hence, we investigated the transcriptional data of BTBD6 from the GEPIA database, and we found that the expression levels of BTBD6 were indeed higher in PC tissues than in normal pancreas tissues. This suggests that increased expression of BTBD6, either by somatic events or constitutively through genetic polymorphisms, could be implicated in the etiology of PC.
In our study, four known risk loci are predicted to have different effects across multiple miRNAs. This shows that our approach is helpful to step from the statistical associations into the functional understanding of biological mechanisms underlying the risk of disease.
The two-phase approach contributes to decreasing the possibilities of spurious findings and offers a great strength. Although none of the SNPs we investigated with the full available sample size reached the genome-wide statistical significance, they remain attractive candidates. It may still be worth to include these relevant functional SNPs into polygenic risk scores for risk stratification.

DATA AVAILABILITY STATEMENT
The PanScan and PanC4 genotyping data are available from the database of Genotypes and Phenotypes (dbGaP, study accession numbers phs000206.v5.p3 and phs000648.v1.p1). The PANDoRA and PanGenEU primary data for this work will be made available to researchers who submit a reasonable request to the corresponding author, conditional to approval by the PANDoRA Steering Committee and Ethics Committees of the Medical Faculty of Heidelberg University, Germany, and of the Health Institute Carlos III (ISCIII), Madrid, Spain, respectively. Data will be stripped from all information allowing identification of study participants.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by PANDoRA Steering Committee and Ethics Committee of the Medical Faculty of Heidelberg University (S-565/2015), Germany, and of the ISCIII, Spain. The patients/ participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
DC and FC conceived and designed the study. YL performed the lab work. YL, CC, MGe, and EL performed data quality control and statistical analyses. YL and CC drafted the manuscript. All other authors provided samples and data. All authors critically read, commented and approved the final manuscript.

ACKNOWLEDGMENTS
The research used the genotyping data provided by the EPIC, we would like to thank the contributors from the United Kingdom. The EPIC-Norfolk study (DOI 10.22025/2019.10.105.00004) has received funding from the Medical Research Council (MR/N003284/1 and MC-UU_12015/1) and Cancer Research United Kingdom (C864/ A14136). We are grateful to all the participants who have been part of the project and to the many members of the study teams at the University of Cambridge who have enabled this research.