Integrative Analysis Identified IRF6 and NDST1 as Potential Causal Genes for Ischemic Stroke

Objective: To highlight potential functional variants and causal genes for ischemic stroke (IS) in genomic loci identified by genome-wide association studies (GWAS). Methods: We examined the association between m6A-SNPs and IS in large scale GWAS. Furthermore, eQTL analysis was performed to evaluate the effect of m6A-SNPs on gene expression. The top associations between m6A-SNPs and gene expressions were validated in 40 individuals from the Chinese Han population. Besides, we applied differential expression analysis and Mendelian randomization (MR) analysis to detect potential causal genes for IS. Results: We found 310 (7.39%) m6A-SNPs which were nominally associated with IS. The proportion of m6A-SNPs with P < 0.05 for IS was significantly higher than the non-m6A-SNPs (95%CI: [5.84%, 7.36%], P = 0.02). We found that the IS-associated m6A-SNP rs2013162 was associated with IRF6 expression (P = 6.30 × 10−23), meanwhile IRF6 was differentially expressed between IS cases and controls (P = 6.15 × 10−3) and showed a causal association with IS (P = 3.64 × 10−4). Similar results were found for m6A-SNP rs2273235 in the NDST1 gene which was associated with cardioembolic stroke (P = 8.47 × 10−3). The associations of rs2013162 and rs2273235 with the expression of IRF6 and NDST1 were validated in blood cells (P = 0.0247 and 0.0007), respectively. Conclusions: This study showed that m6A-SNPs may affect IS risk through altering gene expressions. The results suggested that m6A might play a role in IS etiology and gene expressions that affected by m6A may be causal factors for IS.


INTRODUCTION
Ischemic stroke (IS) is the second leading cause of death worldwide (1). As a complex disease, genetic and epigenetic factors play important roles in IS etiology. Several genome-wide association studies (GWAS) have successfully identified many loci for IS and specific subtypes, including large artery stroke (LAS), cardioembolic stroke (CES), and small vessel stroke (SVS). A recent large-scale meta-analysis of GWAS in 521,612 individuals confirmed 32 IS-associated loci which offering mechanisms not previously implicated in stroke pathophysiology (2). Although previous GWAS have revolutionized the understanding of the genetic architecture of IS and provided a framework for prioritization of stroke risk variants and genes for further functional and experimental follow-up, identification of functional variants and causal genes in the GWAS loci is not finished but still a major challenge. N 6 -methyladenosine (m 6 A) is a pervasive RNA modification that plays critical roles in mRNA stability, protein expression and several other cellular processes (3). Dysregulated m 6 A has been linked to cell fate during the endothelial-to-hematopoietic transition (4), cardiac homeostasis (5, 6) and brain diseases (7). Genetic variants, i.e., the m 6 A-associated SNPs (m 6 A-SNPs), can influence m 6 A by changing the RNA sequences of the target sites (8). If m 6 A modification was affected by this kind of variants, the biological process would likely be modified, leading to under-/overexpression of the protein (8).
Evaluation of the effect of genetic variants on m 6 A modification will increase our understanding of the pathogenic molecular mechanisms and uncover new causal variants. Until now, the relationship between m 6 A-SNPs and IS has not yet been clearly defined. Besides, determination of the association between m 6 A and IS in large sample at genome-wide scale in a large sample is hard to achieve nowadays. In this study we investigated the effect of the m 6 A-SNPs on IS and showed that by using the GWAS identified IS-associated m 6 A-SNPs as a bridge we can assess the relationship between m 6 A and IS indirectly. Meanwhile, we highlight some potential causal genetic variants and genes for IS.

Identification of m 6 A-SNPs for IS
In this study, we first investigated the effect of the m 6 A-SNPs on IS in the published summary data of a large scale GWAS (2). This GWAS comprised 521,612 individuals. Raw data used in the present analysis was the downloaded summary results from the initial GWAS, which included association P values of almost 8 million SNPs and indels for any IS (AIS) and common etiological subtypes of LAS, CES, and SVS. These datasets were available at the MEGASTROKE website (http://megastroke.org/).
To screen out the m 6 A-SNPs in these 8 million SNPs, we annotated them using a list of m 6 A-SNPs which were downloaded from the m6AVar database (http://m6avar.renlab. org/). The list contains 13,703 high, 54,222 medium and 284,089 low confidence level m 6 A-SNPs for human (8). After annotation of the SNPs in the GWAS summary dataset by the list of m 6 A-SNPs, we identified the m 6 A-SNPs which were associated with IS. Those m 6 A-SNPs with P < 0.05 were considered in the following analyses.
Among IS-associated SNPs, we determined if m 6 A-SNPs were overrepresented compared to what would be expected by chance. We randomly sampled 1,000 sets of non-m 6 A-SNPs (the same number of m 6 A-SNPs) from the GWAS datasets for IS as matched background, and then determined if the proportion of m 6 A-SNPs with P < 0.05 was significantly higher than the proportion of non-m 6 A-SNPs with P < 0.05 in the 1,000 sets for each trait. Besides, to assure that allele frequency differences between m 6 A and non-m 6 A-SNPs are not driving the conclusion that there is overrepresentation of m 6 A-SNPs in stroke cases, we selected from allele frequency bins of 1-5, 5-10, 11-20, 21-30, 31-40, and 41-50% for each set of non-m 6 A-SNPs to more precisely mirror the distribution of m 6 A-SNPs.

eQTL Analysis
The m 6 A-SNPs may participate in gene expression regulation through exerting influence on RNA modification, thus they may be associated with gene expression level. We carried out the cis-acting eQTL analysis to obtain evidence on associations between the identified m 6 A-SNPs and gene expressions in HaploReg (https://pubs.broadinstitute.org/mammals/haploreg/ haploreg.php). To validate the eQTLs we also tested genotypes and mRNA expression levels in peripheral blood mononuclear cells (PBMCs) of 40 unrelated Chinese Han individuals (age range from 27 to 67) using RT-PCR method to obtain additional evidence to support the top IS-associated m 6 A-SNPs. PBMCs were isolated from 15 ml peripheral blood by density gradient centrifugation using Lymphoprep (Sigma, life science, USA). Total RNA and DNA were extracted in the same lab according to the instructions recommended by the manufacturer. The study was approved by the ethical committee of Soochow University. The written informed consent was obtained from all of the participants.

Differential Expression Analysis
We further tried to determine if the expression levels of genes which the IS-associated m 6 A-SNPs showed cis-eQTL effects on were associated with IS based on the expression profile data (GSE22255) available in the GEO database (http://www.ncbi. nlm.nih.gov/geo). GSE22255 contained data of gene expression levels in PBMCs from 20 IS cases to 20 controls (9). Differential expression was tested by comparing mean gene expression signals between cases and controls using t-test. The significance level of P = 0.05 was used for the differential expression analyses.

Mendelian Randomization Analysis
We supposed that the IS-associated m 6 A-SNPs affect gene expression and consequently cause IS. To obtain additional evidence to support this idea, we conducted a summary databased Mendelian randomization (SMR) analysis (10). SMR applies the principles of MR (11,12) to jointly analyze eQTL and GWAS summary statistics in order to test for association between gene expression and a trait due to a shared variant at a locus. A HEIDI (heterogeneity in dependent instruments) test for heterogeneity in the resulting association statistics was performed. P HEIDI > 0.05 means that there was no significant heterogeneity underlying the eQTL signals. The SMR software was downloaded from http://cnsgenomics.com/ software/smr/. Genotype data of HapMap r23 CEU was used as a reference panel to calculate the linkage disequilibrium correlation for SMR analysis. The eQTL summary data from four studies were used in our SMR analysis. Westra et al. performed the largest eQTL meta-analysis so far in peripheral blood samples of 5,311 European healthy individuals (13). The genetic architecture of gene expression (GAGE) study detected eQTLs in peripheral blood in 2,765 European individuals (14). The cis-eQTL summary data from the GTEx whole blood (15) and brain (16) were used. Only SNPs within 1 Mb of the transcription start site are available for these two GTEx datasets.

Differential Expression Analysis
For the genes which were presented in Table 1, we compared mRNA expression signals in an expression study. In this expression dataset, we found that IRF6, CUX2, PIK3CD and NDST1 were differentially expressed in PBMCs (P < 0.05) ( Table 1). Among them, the expression levels of IRF6 and NDST1 were affected by rs2013162 (P = 6.30 × 10 −23 ) and rs2273235 (P = 3.23 × 10 −7 ) according to eQTL data from HaploReg, respectively. It means that these 2 m 6 A-SNPs may affect IS through altering the local gene.

Mendelian Randomization Analysis
The SMR analysis identified several genes as potential causal genes underlying IS GWAS association (P SMR < 5 × 10 −6 ), and there was no significant heterogeneity underlying the eQTL signals (P HEIDI > 0.05) ( Table S2). For the 15 genes listed in Table 1, SMR tests (P < 3.33 × 10 −3 were considered significant) identified that three genes (IRF6, RBM18 and KCNJ11) were significantly associated with AIS, IL16 was associated with LAS and NDST1 was associated with CES ( Table 1). All of these five genes passed the HEIDI tests (P HEIDI > 0.05).

The SNP-Expression-IS Trios
We noticed that IRF6 and NDST1 showed more convincing evidence. For example, the IFR6 SNPs achieved suggestive evidence of association with AIS (P < 5 × 10 −5 ) (Figure S1). The AIS-associated (P = 1.27 × 10 −4 ) m 6 A-SNP rs2013162 may affect IRF6 expression (P = 6.30 × 10 −23 ), IRF6 was differentially expressed between IS cases and controls (P = 6.15 × 10 −3 ) and showed a causal association with AIS (P SMR = 3.64 × 10 −4 , P HEIDI = 0.11). The association between NDST1 SNPs and CES did not reach the genome-wide significance level but many SNPs that were in linkage disequilibrium showed suggestive evidence (Figure S2). The CES-associated (P = 3.99 × 10 −4 ) m 6 A-SNP rs2273235 may affect NDST1 expression (P = 3.23 × 10 −7 ), NDST1 was differentially expressed between IS cases and controls (P = 2.65 × 10 −2 ) and showed a causal association with CES (P SMR = 8.47 × 10 −3 , P HEIDI = 0.65). So subsequently, we tested the association between rs2013162 and rs2273235 and mRNA expression levels of IRF6 and NDST1, respectively. The MAF for rs2013162 and rs2273235 were 0.3736 and 0.4821 in Europeans (Table 1), and 0.3721 and 0.2674 in the Han Chinese population according to our 40 sample, respectively. These two SNPs were high-frequency variants in both of the European and East Asian populations. We validated that rs2013162 and rs2273235 were significantly associated with IRF6 and NDST1 expression levels in PBMCs from the Chinese Han individuals (linear regression P = 0.0247 and 0.0007) (Figures 2, 3), respectively.

DISCUSSION
This study represents the first effort to identify putative functional variants and causal genes for IS by integrating m 6 A-SNPs data, gene expression data and genetic association data from large scale GWAS. We found out several m 6 A-SNPs (e.g., rs2013162 and rs2273235) which may be putative functional  Frontiers in Neurology | www.frontiersin.org variants for IS. Moreover, we also showed that these SNPs have potential to affect transcription regulation and were associated with expressions of the local genes (e.g., IRF6 and NDST1). These genes may be potential causal genes for IS. It seems that m 6 A-SNPs may be causally associated with IS by influencing gene expression. Studies have shown that m 6 A modification plays a pivotal role in the regulation of downstream molecular events such as nuclear export, stability, translatability, splicing, and miRNA processing (18). So one of the functional interpretations for the effect of m 6 A-SNPs on IS could be their influence on gene expression levels. Then if the m 6 A-SNP is causal SNP, the affected gene should be causal gene. In this study we detected several potential causal genes for IS based on large data of eQTL and GWAS. For these genes, we validated the association between m 6 A-SNPs and gene expression (e.g., IRF6 and NDST1) in public and in-house data. This finding supported our hypothesis that m 6 A-SNPs may be causally associated with IS by influencing gene expression.
The IRF6 gene encodes a member of the IRF family of transcription factors which play roles in cardiovascular diseases (19,20). IRF6 is likely to promote inflammation to Porphyromonas gingivalis through its regulation of IL-36γ (21), and exhibits tumor suppressor activity in squamous cell carcinomas (22). The RIPK4-IRF6 signaling axis plays a multifaceted role in barrier epithelial homeostasis through its regulation of both keratinocyte inflammation and differentiation (23). IRF6 also contributes to host defense by providing specificity to the regulation of inflammatory chemokine expression by TLR2 in epithelial cells (24). The synonymous m 6 A-SNP rs2013162 in IRF6 (1q32.2) has been well studied on disease. This SNP was not reported in the original METASTROKE GWAS (2). But by looking up the published  GWAS data, we found that rs2013162 was also associated with intracerebral hemorrhage (P = 3.26 × 10 −4 ) (25) and large artery atherosclerosis-related stroke (P = 9.34 × 10 −3 ) (26).
NDST1 encodes a bifunctional GlcNAc N-deacetylase/Nsulfotransferase with important functions in biosynthesis of heparan sulfate, which play roles in triglyceride-rich lipoprotein clearance (27), stroke (28,29), and allergic airway inflammation through the regulation of recruitment of inflammatory cells to the airways by mediating interaction of leukocytes with the vascular endothelium (30). Suppression of NDST1 in endothelial cells results in reduced responsiveness to VEGFA (31), which plays important role in cardiovascular diseases. The synonymous m 6 A-SNP rs2273235 in NDST1 (5q33.1) was associated with CES in two GWAS (2,32). This SNP has the potential to alter regulatory motifs Evi-1, NF-AT1 and PTF1-beta, and locates in a CpG island (CpG: 19) and near DNase I hypersensitive sites ( Figure S3). All in all, the identified m 6 A-SNPs and genes can be suggested as important candidates for further genetic association and functional studies.
This study has several limitations. First, no genome-wide significant associations between m 6 A-SNPs and IS was found. The significant associations proposed in this paper were at the borderline level of statistical significance after multiple testing adjustments. Second, the sample size of the differential gene expression study was small, so very few associations between gene expression levels and IS have been identified. Third, only a very small proportion of m 6 A-SNPs were examined in this study. To fully recognize the impact of m 6 A-SNPs on IS, we still need to identify more m 6 A-SNPs (especially the rare variants) and the effects of the large amounts of m 6 A-SNPs on IS should be evaluated in larger GWAS datasets. Finally, the functionalities of the detected SNPs, especially effects on m 6 A modification, have not been experimentally validated. Further experiments are needed to determine their functions.

CONCLUSIONS
In summary, the present study found out several IS-associated m 6 A-SNPs, and showed that these SNPs may affect the expressions of the local genes which might be potential causal genes for IS. This study increases our understanding on the regulation patterns of SNP. Although we found supplementary functional information to support the significant findings, further functional studies were needed to elucidate the mechanisms.

DATA AVAILABILITY
All datasets generated for this study are included in the manuscript and/or the Supplementary Files.

ETHICS STATEMENT
This study was carried out in accordance with the recommendations of Soochow University with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the Soochow University.

AUTHOR CONTRIBUTIONS
X-BM, S-FL, Y-HZ, and HZ contributed at all stages of manuscript preparation. HZ critically appraised and approved the final manuscript. All authors were involved in data recording and discussed the results.