DATA REPORT article
Genome-Wide Identification of N6-Methyladenosine (m6A) SNPs Associated With Rheumatoid Arthritis
- 1Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Suzhou, China
- 2Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Soochow University, Suzhou, China
- 3Department of Epidemiology, School of Public Health, Medical College of Soochow University, Suzhou, China
Previous large scale genome-wide association studies (GWAS) have identified more than one hundred susceptibility loci for rheumatoid arthritis (RA) (Stahl et al., 2010; Eyre et al., 2012; Okada et al., 2012, 2014b). One major issue in the post GWAS era is to identify functional (causal) variants in the disease-associated loci. Recently, whole-exome sequencing studies have identified several missense mutations for RA (Mitsunaga et al., 2013; Okada et al., 2014a). Nevertheless, few studies have focused on potentially functional variants which could affect N6-methyladenosine (m6A) methylation.
N6-methyladenosine (m6A) has been discovered as the first example of reversible RNA methylation and is a pervasive modification in the mRNA of all eukaryotes. It plays critical roles in regulating many fundamental biological processes such as gene expression control (Meyer and Jaffrey, 2014) and regulation of mRNA stability (Wang et al., 2014) and homeostasis (Edupuganti et al., 2017). Evidence has increasingly shown that m6A modification modulates almost all aspects of RNA processing such as nuclear export, translatability, splicing, and miRNA processing (Visvanathan and Somasundaram, 2018). Thus, m6A modification adds a new layer of gene expression regulation, and it has been implicated in T cell differentiation, homeostasis, and response to HIV infection (Li et al., 2018). Moreover, it was found to be involved in the etiology of various diseases (Visvanathan and Somasundaram, 2018). Given the vital function of m6A modification in gene expression regulation and immune response, it is possible that m6A is involved in the etiology of RA.
Recent studies have suggested that SNPs would affect m6A by altering the RNA sequences of the target sites or key flanking nucleotides (Zheng et al., 2018). It means that m6A-associated SNPs (m6A-SNPs) may have regulatory potential to affect gene expression and mRNA stability and homeostasis, which may consequently affect disease such as RA. However, what is the relationship between m6A-SNPs and RA is still unclear. Thus, we examined the effect of the m6A-SNPs on RA in a large scale GWAS. Then, to test whether the RA-associated m6A-SNPs have potential regulatory effects on gene expression and whether these disruptions may contribute to RA, we performed eQTL and differential expression analyses using whole genome data of about 10.5 million SNPs and 21,323 mRNAs from 26 RA cases and 17 controls, as well as public data.
Value of the Data
• We found 37 m6A-SNPs were associated with RA at genome-wide significance level (P < 5.0 × 10−8) in Asians or Europeans.
• A heap of RA-associated m6A-SNPs at the 6p21-6p22 which contains genes encode the major histocompatibility complex were detected, such as rs1033500 (P = 1.0 × 10−250) in C6orf10 and rs1042136 (P = 2.5 × 10−34) in HLA-DPB1.
• We detected associations of 27 m6A-SNPs with expressions of 24 local genes. By integrating the RA-associated m6A-SNPs with gene expression data, we formed 23 SNP-Gene-RA trios.
• The present data increase our understanding on the regulation patterns of SNP, and may provide new insight into the mechanism underlying the associations between SNPs and RA.
Materials and Methods
Determination of m6A-SNPs for RA
We first investigated the effect of m6A-SNPs on RA in data of a large scale GWAS (Okada et al., 2014b), which was publicly available at http://plaza.umin.ac.jp/~yokada/datasource/software.htm. This GWAS comprised 19,234 RA cases and 61,565 controls from European and Asian populations. The downloaded datasets contained summary results of almost 6.6 million SNPs. The m6A-SNPs in these 6.6 million SNPs were sorted out according to a list of m6A-SNPs in the m6AVar database (http://m6avar.renlab.org/). The m6AVar database currently contains 303,001 m6A-SNPs for human (Zheng et al., 2018). We have not validated the associations between the identified m6A-SNPs and RA in another independent sample. However, these SNPs were located in RA loci that were confirmed in previous GWAS, so the associations should be reliable.
Twenty-six female patients suffering from RA were included into this study. All patients had been diagnosed as RA following the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) criteria (Aletaha et al., 2010). In this study, the EULAR Disease Activity Score (DAS28) (Prevoo et al., 1995) of the 26 patients ranges from 2.91 to 6.41. The mean age of the patients was 47.39 (±10.71). A total of 17 female subjects without RA or other autoimmune diseases were recruited as controls. The mean age of the controls was 47.11 (±14.09). Genomic DNA and total RNA in PBMCs were extracted according to the instructions recommended by the manufacturer. The study was approved by the ethical committee of Soochow University. The written informed consents were obtained from all the subjects.
Genotyping and mRNA Expression Profiling
Affymetrix Genome-Wide Human SNP Array 6.0 chips were employed for SNP genotyping by following the protocol recommended by the manufacturer. After quality control and genotype imputation by the 1000 Genomes project phase 3 reference panel using FISH software (Zhang et al., 2014), about 10.5 million SNPs could be used in the following analyses. Genome-wide mRNA expression was profiled using mRNA Human Gene Expression Microarray V4.0 (CapitalBio Corp, Beijing, China) according to the manufacturer's instructions. After further filtering out probes with detection rate less than 80% and/or incomplete annotation information, a total of 21,323 unique mRNAs were used for further analysis.
eQTL Analysis for the RA-Associated m6A-SNPs
The RA-associated m6A-SNPs may have regulatory potential to affect gene expression and mRNA stability and homeostasis, which may result in variations of mRNA levels. We carried out the cis-acting eQTL analysis in PBMCs of the 43 subjects to obtain functional evidence for the identified m6A-SNPs. The m6A-SNPs affect m6A by altering the mRNA sequences of the target sites or key flanking nucleotides, so we only focused on the relationship of the m6A-SNP with the expression of the local gene. The distance from SNP position to either starting point of or ending point of the mRNA transcript was 0 bp (the tested m6A-SNPs all locate inside genes) in the cis-eQTL analysis, which was carried out by using the R package MatrixeQTL (Shabalin, 2012). We applied linear regression analyses to assess the associations between SNPs and mRNA expressions, adjusted for RA. In addition, we also performed the cis-eQTL analysis in the HaploReg database (Ward and Kellis, 2012). Except for eQTL effects, we searched for potential functionalities in transcription regulation, such as altering protein binding, in HaploReg and RegulomeDB for the identified m6A-SNPs. ENCODE data were also searched in UCSC Genome Browser to identify more information for the significant m6A-SNPs.
Differential Expression Analysis
We further tried to determine if the disruption of gene expression by the RA-associated m6A-SNPs could affect RA. We examined the differential expression of the identified genes that the m6A-SNPs showed significant cis-eQTL signals with in our in-house dataset (26 RA cases and 17 controls) in PBMCs. Since the PBMCs consist of multiple immune cells including monocytes, CD8 T cells, B cells, regulatory T cells (Tregs), and activated natural killer (NK) cells, we further inferred the expressions of these 24 genes in various cell subgroups from PBMCs by using the CIBERSORT algorithm. The CIBERSORT algorithm is a method using support vector regression for cell type deconvolution. We did deconvolution for the 26 RA cases and 17 controls with 100 iterations and a default input matrix of cell-type specific gene expression signatures of 22 distinct immune cell types (LM22) (Newman et al., 2015).
Besides, we also detected differential expression genes based on three public gene expression datasets, GSE15573 (peripheral blood mononuclear cells) (Teixeira et al., 2009), GSE17755 (peripheral blood cells) (Lee et al., 2011), and GSE1919 (synovial tissues) (Ungethuem et al., 2010), which were downloaded from GEO database. Differential expression was tested by comparing mean gene expression signals between cases and controls using t-test. The significance level of P = 0.05 was used for the differential expression analyses. Differential expression genes detected in at least one of the three studies were considered.
Results and Discussion
The first step of this study was to pick out m6A-SNPs from the RA GWAS dataset which containing 6.6 million SNPs according to the annotation information of the 313 thousand m6A-SNPs in the m6AVar database. We found 3,883 unique m6A-SNPs which located not only in protein-coding genes, but also non-coding RNAs. Among these SNPs, 227 and 308 (476 unique) were nominally (P < 0.05) associated with RA in Asians and Europeans, respectively. Specifically, 9 and 32 (38 unique) m6A-SNPs were significant at genome-wide level (P < 5.0 × 10−8) in Asians (Figure 1A) and Europeans (Figure 1B), respectively. Among these genome-wide significant m6A-SNPs, the effects of rs1046080, rs13978, rs2736158, and rs58892873 on m6A were confirmed by miCLIP/PA-m6A-Seq experiments. The effects of another 8 SNPs on m6A were confirmed by MeRIP-Seq experiments. The rest 26 fell within the low confidence categories (predicted to be associated with m6A). The effects of the predicted SNPs on m6A modification have not been validated by technical and biological experiments. Further experiments are needed to determine their effects in large scale studies (e.g., QTL mapping).
Figure 1. Genome-wide results for the association between m6A-SNPs and RA. The Manhattan plots show –log10P values for the association of 2,672 m6A-SNPs with RA in Asians (A) and the association of 3,667 m6A-SNPs with RA in Europeans (B). The data was from the RA GWAS published at 2014. SNPs reached the genome-wide significance level (P < 5.0 × 10−8) were annotated.
Most of these SNPs were in the major histocompatibility complex (MHC) region, except rs3748816 (P = 2.8 × 10−8, MMEL1) and rs2305480 (P = 5.0 × 10−9, GSDMB) (Supplementary Table S1). The m6A-SNP rs3748816 in MMEL1 at chromosome 1 was very near in physical with the reported SNP chr1:2523811. Similarly, the m6A-SNP rs2305480 in GSDMB at chromosome 17 located next to the reported SNP chr17:38031857. Although studies have made efforts to define the association across this region and identify functional and potentially causal variants, pinpointing those loci were still challenging, in part due to the complexity and the broad linkage disequilibrium characteristic of the MHC (Raychaudhuri et al., 2012). Our study illustrated how identification of m6A-SNPs from the GWAS dataset can help fine-map association signals in the MHC region.
In addition, we examined the associations of SNPs in the key m6A regulators encoded genes METTL3, METTL14, WTAP, FTO, and ALKBH5 with RA. Unfortunately, very few SNPs in these genes were examined in the original GWAS and we found no genome-wide significant association. The only one SNP with P < 0.0001 was rs55982358 in FTO (P = 5.6 × 10−5 in Europeans). Five SNPs in METTL3, 1 SNPs in METTL14, 1 SNPs in WTAP, 159 SNPs in FTO and 21 SNPs in ALKBH5 with P < 0.05 were found for Asian or European population. These SNPs were not in the list of m6A-SNPs in the m6VAR database, but they should also be m6A associated because genetic variants in these genes might have potentials to influence the key m6A proteins.
To further clarify the possible functional mechanisms underlying the identified m6A-SNPs in association with RA, we investigated whether they were associated with the expression of local genes (Supplementary Tables S4 and S5). In total, 27 (13 for Asians and 20 for Europeans) of the examined RA-associated m6A-SNPs (P < 0.0001 in Asians or Europeans) showed cis-eQTL (SNP affects gene expression at the same locus) signals with the 24 corresponding genes in different cells or tissues (Supplementary Table S1). Among these SNPs, 12 were non-synonymous single nucleotide variants, and 17 were genome-wide significant (P < 5.0 × 10−8) (Supplementary Table S1). Some of these m6A-SNPs showed eQTLs in different tissues or cells (Supplementary Table S2). Six of the 27 SNPs showed associations in both Asians and Europeans, including rs2305480 (GSDMB), rs1042308 (HLA-DPA1), rs3748816 (MMEL1), rs241449 (TAP2), rs2857433 (TRIM10), and rs757262 (TRIM40). For the associations between the eQTL m6A-SNPs and RA in Asians, only rs241449 (TAP2) and rs1042308 (HLA-DPA1) were significant at genome-wide scale.
A proportion of m6A-SNPs were missense mutations or variants which could influence transcription. The major function of missense mutation was changing the encoded amino acids. However, the functions of the missense m6A-SNPs should not be confined to amino acid change, but RNA processing. Studies have suggested that m6A-SNPs could also influence promoter activity, mRNA stability and subcellular localization of mRNAs and/or proteins (Shastry, 2009). The functions of m6A-SNPs in UTRs should not be confined to alter microRNA or transcription factor bindings either. Several significant m6A-SNPs in the MHC region also found to locate in transcription factor binding sites, DNase I hypersensitive sites or CpG islands (Figures S1–S7). The missense mutation rs3748816 in MMEL1 (1p36.32) locates in a DNase I hypersensitive site and very closed to a CpG island (Figure S8).
Differential Expression Analysis
For the above 24 genes, we compared mRNA expression signals in PBMCs of 26 RA cases and 17 controls and 3 published expression studies. In these 4 expression datasets, we found that 20 (17 in PBMCs) of them were differentially expressed in at least one study (P < 0.05) (Table 1, Supplementary Table S1). Among them, MAPK13 and HLA-DPB1 were differentially expressed in PBMCs and synovial tissues. There were 23 RA-associated m6A-SNPs in these 20 differential expression genes. HLA-A contained 2 (rs1136702 and rs3173419) and HLA-C contain 3 (rs1131123, rs35075694, and rs707908) RA-associated m6A-SNPs. Therefore, in total, we found 23 m6A-SNPs that formed 23 SNP-Gene-RA trios (Table 1). It means that these m6A-SNPs may affect RA risk through altering the expressions of local genes. Most of these m6A-SNPs showed eQTLs in multiple tissues or cells (Supplementary Table S2). We further searched for potential functionalities for these 23 m6A-SNPs in transcription regulation, such as altering protein binding. In RegulomeDB and HaploReg, we found that 10 of the 23 m6A-SNPs might alter the binding of one or more proteins and 16 might alter regulatory motifs in different cell types (Supplementary Table S2). However, the effects of the detected SNPs on m6A modification have not been validated by technical and biological experiments. Further experiments are needed to determine their functions in human cell lines.
As shown in Supplementary Table S3, CIBERSORT algorithm inferred the relative proportions of immune cells in PBMCs of the 26 RA cases and 17 controls. The PBMCs mainly contained monocytes (highest proportion), CD8 T cells, B cells, regulatory T cells (Tregs), and activated natural killer (NK) cells. Significantly (Bonferroni correction) lower proportions of B cells were observed in RA patients. Further we performed the differential expression analyses for the 17 genes in each cell subgroup of PBMCs. We found that the differences exist for 17 genes in B cells, 14 genes in NK cells, 4 genes (C6orf10, HLA-A, TRIM10, and TRIM27) in CD8 T cells, and only TRIM39 in monocytes (Supplementary Table S3). The above results showed that the mean proportions of cell subgroups varied and were significantly different between RA cases and controls (e.g., B cells). Besides, the differences between the cell type components seemed to be associated with the number of significant genes (e.g., 17/17 in B cells; 14/17 in NK cells). These results were not beyond our expectation because PBMCs is a group of mixed cells, and the expression level of a gene specific to each cell types should be positively related to the proportion of the cell type. In addition, PBMCs is a group of immune associated cells, each of which may have cell type specific genes related to RA. Therefore, we believe that PBMCs could be commonly used as target cells for RA study, especially in initial discovery, because this would increase the possibility of finding a significant gene. Of course, the identified genes in PBMCs in initial discovery should be further investigated in cell subgroups. But on the other hand, we should realize that the use of PBMCs might also hide differences in gene expression if one cell type has increased and another one decreased expression of a gene.
In summary, the present study found many RA-associated m6A-SNPs which may play important roles in the pathogenesis of RA and demonstrated the potential functionality of these SNPs. The findings increase our understanding on the regulation patterns of SNP, and may provide new insight into the mechanism underlying the associations between SNPs and RA. Future studies are needed to confirm the m6A-SNP associations and to elucidate the mechanisms.
Data of RA GWAS was publicly available at http://plaza.umin.ac.jp/~yokada/datasource/software.htm. The m6A-SNPs data can be downloaded from the m6AVar database (http://m6avar.renlab.org/). The three public gene expression datasets, GSE15573, GSE17755, and GSE1919 can be downloaded from GEO database (https://www.ncbi.nlm.nih.gov/geo/). The raw data (excel file) supporting the conclusions of the eQTL analysis is included in the supplementary files.
S-FL and Y-HZ conceived and designed the study. X-BM analyzed the data. X-BM wrote the paper.
The study was supported by Natural Science Foundation of China (31401079, 81473046 and 81373010), the Startup Fund from Soochow University (Q413900313), Project funded by China Postdoctoral Science Foundation (2014M551649), and a Project of the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Conflict of Interest Statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2018.00299/full#supplementary-material
Aletaha, D., Neogi, T., Silman, A. J., Funovits, J., Felson, D. T., Bingham, C. O. III., et al. (2010). 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum. 62, 2569–2581. doi: 10.1002/art.27584
Edupuganti, R. R., Geiger, S., Lindeboom, R. G. H., Shi, H., Hsu, P. J., Lu, Z., et al. (2017). N(6)-methyladenosine (m6A) recruits and repels proteins to regulate mRNA homeostasis. Nat. Struct. Mol. Biol. 24, 870–878. doi: 10.1038/nsmb.3462
Eyre, S., Bowes, J., Diogo, D., Lee, A., Barton, A., Martin, P., et al. (2012). High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat. Genet. 44, 1336–1340. doi: 10.1038/ng.2462
Lee, H. M., Sugino, H., Aoki, C., and Nishimoto, N. (2011). Underexpression of mitochondrial-DNA encoded ATP synthesis-related genes and DNA repair genes in systemic lupus erythematosus. Arthritis Res. Ther. 13:R63. doi: 10.1186/ar3317
Li, L. J., Fan, Y. G., Leng, R. X., Pan, H. F., and Ye, D. Q. (2018). Potential link between (m6A) modification and systemic lupus erythematosus. Mol. Immunol. 93, 55–63. doi: 10.1016/j.molimm.2017.11.009
Mitsunaga, S., Hosomichi, K., Okudaira, Y., Nakaoka, H., Kunii, N., Suzuki, Y., et al. (2013). Exome sequencing identifies novel rheumatoid arthritis-susceptible variants in the BTNL2. J. Hum. Genet. 58, 210–215. doi: 10.1038/jhg.2013.2
Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., et al. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457. doi: 10.1038/nmeth.3337
Okada, Y., Diogo, D., Greenberg, J. D., Mouassess, F., Achkar, W. A., Fulton, R. S., et al. (2014a). Integration of sequence data from a consanguineous family with genetic data from an outbred population identifies PLB1 as a candidate rheumatoid arthritis risk gene. PLoS ONE 9:e87645. doi: 10.1371/journal.pone.0087645
Okada, Y., Terao, C., Ikari, K., Kochi, Y., Ohmura, K., Suzuki, A., et al. (2012). Meta-analysis identifies nine new loci associated with rheumatoid arthritis in the Japanese population. Nat. Genet. 44, 511–516. doi: 10.1038/ng.2231
Prevoo, M. L., van 't Hof, M. A., Kuper, H. H., van Leeuwen, M. A., van de Putte, L. B., and van Riel, P. L. (1995). Modified disease activity scores that include twenty-eight-joint counts. Development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum. 38, 44–48.
Raychaudhuri, S., Sandor, C., Stahl, E. A., Freudenberg, J., Lee, H. S., Jia, X., et al. (2012). Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis. Nat. Genet. 44, 291–296. doi: 10.1038/ng.1076
Stahl, E. A., Raychaudhuri, S., Remmers, E. F., Xie, G., Eyre, S., Thomson, B. P., et al. (2010). Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514. doi: 10.1038/ng.582
Teixeira, V. H., Olaso, R., Martin-Magniette, M. L., Lasbleiz, S., Jacq, L., Oliveira, C. R., et al. (2009). Transcriptome analysis describing new immunity and defense genes in peripheral blood mononuclear cells of rheumatoid arthritis patients. PLoS ONE 4:e6803. doi: 10.1371/journal.pone.0006803
Ungethuem, U., Haeupl, T., Witt, H., Koczan, D., Krenn, V., Huber, H., et al. (2010). Molecular signatures and new candidates to target the pathogenesis of rheumatoid arthritis. Physiol. Genomics 42A, 267–282. doi: 10.1152/physiolgenomics.00004.2010
Ward, L. D., and Kellis, M. (2012). HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40. D930–D934. doi: 10.1093/nar/gkr917
Zhang, L., Pei, Y.F., Fu, X., Lin, Y., Wang, Y. P., and Deng, H. W. (2014). FISH: fast and accurate diploid genotype imputation via segmental hidden Markov model. Bioinformatics 30, 1876–1883. doi: 10.1093/bioinformatics/btu143
Keywords: rheumatoid arthritis, M6A, genome-wide association study, major histocompatibility complex, gene expression
Citation: Mo X-B, Zhang Y-H and Lei S-F (2018) Genome-Wide Identification of N6-Methyladenosine (m6A) SNPs Associated With Rheumatoid Arthritis. Front. Genet. 9:299. doi: 10.3389/fgene.2018.00299
Received: 22 February 2018; Accepted: 16 July 2018;
Published: 03 August 2018.
Edited by:Rui Henrique, IPO Porto, Portugal
Reviewed by:Zhixiang Zuo, Sun Yat-sen University, China
Tomas J. Ekstrom, Karolinska Institutet (KI), Sweden
Copyright © 2018 Mo, Zhang and Lei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shu-Feng Lei, email@example.com