MicroRNA Variants and HLA-miRNA Interactions are Novel Rheumatoid Arthritis Susceptibility Factors

Genome-wide association studies have identified >100 genetic risk factors for rheumatoid arthritis. However, the reported genetic variants could only explain less than 40% heritability of rheumatoid arthritis. The majority of the heritability is still missing and needs to be identified with more studies with different approaches and populations. In order to identify novel function SNPs to explain missing heritability and reveal novel mechanism pathogenesis of rheumatoid arthritis, 4 HLA SNPs (HLA-DRB1, HLA-DRB9, HLA-DQB1, and TNFAIP3) and 225 common SNPs located in miRNA, which might influence the miRNA target binding or pre-miRNA stability, were genotyped in 1,607 rheumatoid arthritis and 1,580 matched normal individuals. We identified 2 novel SNPs as significantly associated with rheumatoid arthritis including rs1414273 (miR-548ac, OR = 0.84, p = 8.26 × 10−4) and rs2620381 (miR-627, OR = 0.77, p = 2.55 × 10−3). We also identified that rs5997893 (miR-3928) showed significant epistasis effect with rs4947332 (HLA-DRB1, OR = 4.23, p = 0.04) and rs2967897 (miR-5695) with rs7752903 (TNFAIP3, OR = 4.43, p = 0.03). In addition, we found that individuals who carried 8 risk alleles showed 15.38 (95%CI: 4.69–50.49, p < 1.0 × 10−6) times more risk of being affected by RA. Finally, we demonstrated that the targets of the significant miRNAs showed enrichment in immune related genes (p = 2.0 × 10−5) and FDA approved drug target genes (p = 0.014). Overall, 6 novel miRNA SNPs including rs1414273 (miR-548ac, p = 8.26 × 10−4), rs2620381 (miR-627, p = 2.55 × 10−3), rs4285314 (miR-3135b, p = 1.10 × 10−13), rs28477407 (miR-4308, p = 3.44 × 10−5), rs5997893 (miR-3928, p = 5.9 × 10−3) and rs45596840 (miR-4482, p = 6.6 × 10−3) were confirmed to be significantly associated with RA in a Chinese population. Our study suggests that miRNAs might be interesting targets to accelerate understanding of the pathogenesis and drug development for rheumatoid arthritis.


BACKGROUND
Rheumatoid arthritis (RA) is a chronic inflammatory disorder caused by the interaction between multiple factors including genetics, epigenetics, and the environment (van der Woude et al., 2010;Stahl et al., 2012;Cavalli and Heard, 2019). Twin studies estimate an RA heritability of ∼60% (Eyre et al., 2012;Westra et al., 2018). In the past 15 years, GWAS have identified several hundred RA-associated variants. However, the reported genetic variants only explain <40% of RA heritability Okada et al., 2012;Raychaudhuri et al., 2012;Westra et al., 2018). Furthermore, studies within non-European-derived samples are needed to aid understanding of RA molecular pathophysiology as different variants and, potentially, different disrupted pathways may be present in those sample sets. Additionally, studying SNPs in specific functional classes of genes or motifs may reveal specific pathogenic mechanisms not previously discovered through GWAS (Li et al., 2018). GWAS results indicate that 90% of diseaseassociated variants are located in non-coding regions, indicating that regulatory elements may play important roles in complex disease etiology, including RA (Maurano et al., 2012). Across regulatory elements, microRNAs are important targets to interrogate as miRNA SNPs, which may modify the expression of numerous genes. Moreover, miRNAs play pivotal roles in both innate and adaptive immunity (Lai et al., 2017;Zhu et al., 2020), sex-specific effects (Khalifa et al., 2016), and disease onset. (Chatzikyriakidou et al., 2012;Moran-Moguel et al., 2018;Karami et al., 2020). The interaction of the proteins involved in antigen presentation, such as HLA class I proteins have been extensively studied in which interactions between the alleles of the HLA haplotypes were found to affect the immune response levels and autoimmune disease susceptibility (Shiina et al., 2009). However, the interaction between HLA-genes with non-HLA regions, especially epigenetic factors like miRNA has not been widely investigated.
Recently, several preliminary RA-association studies focused on miRNAs have been conducted for Asian (Yang et al., 2011;Yang et al., 2012;Li et al., 2014;Fu et al., 2016;Yang et al., 2017), European (Hashemi et al., 2013), and African (Shaker et al., 2018) populations. However, these studies have very limited sample sizes and only included a subset of miRNAs without genotyping all miRNA common SNPs. Additionally, the previous RA GWAS in Han Chinese was conducted in 2014 (Jiang et al., 2014) using an array with poor coverage of miRNA SNPs. Therefore, we conducted an exhaustive study of functional miRNA SNPs in RA. miRbase annotated 1,920 primary miRNA and 2,883 mature miRNAs. From dbSNP153, there are >45,705 SNPs located in primary miRNAs and 17,570 in mature miRNA (Kozomara et al., 2019); however, only 733 of these SNPs segregate alleles at a moderate-high frequency (i.e. are common SNPs with MAF>1%) within primary miRNAs regions. Within specific populations, this number is expected to be substantially reduced to ∼200 SNPs, providing an opportunity to use a multiplex genotyping assay in our RA samples. Hence, we conducted a systemic association study between common East-Asian miRNA SNPs with RA in a large Han Chinese cohort to identify novel miRNA SNPs and miRNA epistatic interactions.

Precision Medicine Research Cohort in Shanghai Guanghua Hospital
Guanghua Hospital Precision Medicine Research Cohort (PMRC) is a hospital-based longitudinal cohort to investigate risk factors, genetic susceptibility, pharmacogenetics for rheumatology diseases such as RA, osteoarthritis, and ankylosing spondylitis. Healthy individuals are derived from those with an annual physical exam without rheumatological disease. Currently, PMRC has enrolled >30,000 disease patients and 10,000 healthy individuals as controls. We randomly selected 3,223 individuals including 1,625 seropositive (RF+ and anti-CCP+) RA and 1,598 controls from the PMRC. Written consent was collected prior to enrollment. Non-Han Chinese individuals were excluded from the study to avoid confounding by population stratification. All cases fulfilled the 2010 European League against Rheumatism-American College of Rheumatology criteria or 1987 American College of Rheumatology revised criteria for RA. All healthy controls were required do not have personal or family history of ankylosing spondylitis, rheumatoid arthritis, osteoarthritis, type 1 and 2 diabetes, chronic infection, or common cancers. We did not find a significant case/control difference between gender, smoking, drinking, BMI (1,607 RA and 1,580 normal individuals) while the RA cohort shows a slightly higher age compared with the normal cohort (p 0.035). We adjusted the above confounder in our association test between SNPs and RA traits.
The study was reviewed and approved by the Institutional Review Board of Guanghua Hospital (No: IRB12018-K-12) and all methods were performed in accordance with the relevant guidelines and regulations. The demographic and clinical characteristics of the whole sample are presented in Supplementary Table S1.

DNA Extraction, SNP Selection, and Genotyping
Genomic DNA was extracted from peripheral blood using the CB-kit (CoWin Biosciences, CWBIO., China) following the manufacturer's instructions. Allelic discrimination was automated using the manufacturer's software, which has been widely described in previous studies (Jiang et al., 2012). Internal positive and negative control samples were used and a test of Hardy-Weinberg equilibrium was employed to assess the genotyping quality (SNPs exceeding HWE p-value cutoff were excluded from RA association testing). Cases and controls were required to have a family history of at least three generations of residency in Shanghai or neighboring regions. The SNaPshot Multiplex System (Applied Biosystem, USA) was used for the genotyping which has been widely used in our previous studies (Huang et al., 2014;Shen et al., 2016).
To select SNPs for subsequent association testing, human microRNAs were downloaded from miRBase methods (Release 22.1). Mapping of SNPs within dbSNP (dbSNP153, 08/08/2019) was performed for human microRNA genomic regions, resulting in 40,602 SNPs. To select common SNPs (MAF>0.01), we collected the allele frequency for all the SNPs from Gnomad (Lek et al., 2016), Asian100K (GenomeAsia100K Consortium, 2019), and the 1,000 Genomes dataset (Maurano et al., 2012) The minor allele frequency was required to be higher than 0.01 in all datasets (1000Genomes Project Consortium et al., 2015. Only biallelic SNPs were included in the analysis, while tri-allelic SNPs were excluded from this study. Following filtering, 243 SNPs were obtained. In order to decrease the genotyping cost, one SNP from pairs of SNPs with high LD (R 2 > 0.8) were removed and only one of them was included in the genotyping assay. Four SNPs within known GWAS associations with RA were included: rs9268839 (HLA-DRB9) (Okada et al., 2014;Laufer et al., 2019), rs9275376 (HLA-DQB1) (Regueiro and Gonzalez, 2020;Guo et al., 2019;Song et al., 2014;Castro et al., 2001;van Kerckhove et al., 1991), rs4947332 (tag-SNP for HLA-DRB1*04:07) and rs7752903 (TNFAIP3) (Okada et al., 2014;Laufer et al., 2019) In addition, a panel of 4 ancestry informative markers (negative control) were selected to estimate the potential effects of population stratification: rs174583 (11q12.2), rs11745587 (5q31.1), rs521188 (1p31.3) and rs7740161 (6q23.2). These SNPs were derived from a previous Han Chinese population study (Qin et al., 2014) and are informative in comparing the population structure between South and North Han Chinese since the Shanghai region is a mixed population from different regions in China ( Figure 1A). Finally, a total of 233 SNPs were genotyped and 225 common miRNA SNPs were included in the novel RA-association SNP identification study.

Quality Control and Power Analysis
For genotyping QC, ∼2.5% samples (40 RA and 32 normal) were randomly selected for repeat genotyping. The concordance between initial and replicate genotyping was 99.98%. SNP imputation was accomplished through the Michigan Imputation Server (https://imputationserver.sph.umich.edu/ index.html) with East Asian samples from the Genome Asian Pilot (GAsP) since recent evidence shows that the GAsP reference panel showed higher imputation accuracy (93-95%) compared with 1,000 Genome Asian panel (<90%) (Belsare et al., 2019). The missing ratio between case and control differential test and HWE in control samples were evaluated. SNPs were removed when p < 0.01. GRCH37/hg19 was used to determine genomic positions and R 2 > 0.6 as the cut-off to selected imputed SNPs of high quality and located in miRNA regions.
In the power analysis, given the prevalence of RA (p 0.01), case (n 1,607) and control (n 1,580) size, risk allele frequency (f 0.1) and genetic effect size (OR 1.5), the power for multiplicative model, additive model, dominant models were 0.856, 0.823 and 0.742, respectively using a significance level of alpha 5 × 10 −3 . In addition, with our design and the above effect size, we calculate that SNPs with an MAF of 0.025 can attain a statistical power of 80% with nominal p < 0.05. Therefore, our study could provide reliable genetic association results for the majority of the alleles we have included given that the MAF of 201 (86.3%) SNPs are >10% and MAF of 96% included SNPs exceed 2.5%.

Association Analysis, Power Calculations, Epistasis Analysis, and Cumulative Risk Analysis
The association for each SNP was calculated and OR estimated (95% CI) using Chi-square test, Fisher's Exact test (N < 5), Cochran-Armitage Trend test, and Bayesian logistic regression (BLR) (Banerjee et al., 2018) adjusted for gender (binary), age (continuous), BMI (continuous), smoking (binary) and drinking history (binary), using the bayesm (version: 3.1.4) package Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 747274 (Banerjee et al., 2018). To explore the power of the study, a Monte Carlo simulation was performed under different models including dominant, recessive, and additive models, additive models, using a Chi-square test or Bayesian logistic regression model. We performed fixed-effects and random-effects meta-analysis with plink between our dataset and Okada's dataset (Okada et al., 2014) to identify novel RA-associated miRNA-SNPs by large sample size and statistical power. SNP-SNP interactions were analyzed by using traditional point-wise interaction analysis based on SNPassoc (Landa et al., 2009), logistic test (Boulesteix et al., 2007), "fast-epistasis" in Plink (Purcell et al., 2007). We applied Y ∼ b0 + b1*A+ b2*B+ b3*AB+ e to detect epistasis effect between SNP A and SNP B when b3 not equal 0 (null hypothesis: b3 equal 0). A p < 0.001 was considered significant. A cumulative RA risk prediction model was fitted with the 6 most significant SNPs (rs1414273, rs4947332, rs9268839, rs9275376, rs7752903, and rs2620381) by both a Chi-square and logistic regression test to adjust by the aforementioned covariates and in which individuals carrying 1 risk allele were used as the reference. Multiple test correction was conducted through a False Discovery Rate (FDR) implemented in R with the function p. adjust) (Ghosh, 2019). SNP derived miRNA target gain or loss were imputed by miRNA-SNP database (Gong et al., 2015) with the default setting. All the statistical analyses were performed with R (version: 3.6.1).

Bioinformatics Analysis of miRNA Regulatory Networks and Enrichment Analysis
To understand the regulatory networks driven by miRNAs and the overlap with RA GWAS genes and immune system genes, a miRNA regulatory network analysis was performed with miRDB (Chen and Wang, 2020). Genes predicted to be regulated by significant miRNAs were calculated among 123 previouslyidentified RA GWAS genes, 4,723 immune system genes collected by InnateDB database (Breuer et al., 2013), and 373 FDA-targeted pharmacogenes. A target sore threshold was set to 95 in miRDB to identify reliable gene targets for the miRNAs. Cytoscape was employed to construct and visualize the miRNA-mRNA network. We calculated a hypergeometric test of the enrichment in miRNA targets within RA GWAS significant genes, immune-related genes, and FDA approved drug targets. A random sampling technique was applied to assess the null distribution of p-values (n 50,000 iterations). 37,875 total genes in the genome (GENCODE v32) were assumed for the calculations. Enrichment was measured with fold-change (FC) and p-values were calculated by summing the frequency of more extreme values in the null distribution of p-values.

SNP Selection, Genotyping, and Population Structure
Overall, the genotyping generated high quality data with 94% of the SNPs showing high genotyping quality. Seven of the SNPs were removed from being promoted to the RA association evaluation due to them having a genotyping ratio of less than 99.0%. The missing ratio between case and control was calculated. Hardy-Weinberg equilibrium in control samples was evaluated and another 3 SNPs were removed when p < 0.01. Finally, 223 SNPs in 1,607 rheumatoid arthritis 1,580 normal individuals were used in the association analysis (genotyping rate 98.88%). PCA analysis based on these SNPs showed our samples clustered with the East Asian population ( Figure 1B). Furthermore, we did not identify significant age, BMI, drinking, and smoking history differences between cases and controls (p > 0.01, Supplementary Table S1). However, as a conservative measure, association testing of all the variants included age, gender, BMI, drinking, and smoking as covariates in the Bayesian logistic regression (BLR)-based tests.
Epistasis Analysis to Identify miRNA Interactions miRNAs play multiple regulatory roles in both the modification of target gene expression and larger regulatory networks. To Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 747274 identify epistatic effects between miRNAs and their role in the susceptibility of RA, we applied an epistasis analysis to reveal the interaction between the above identified miRNA SNPs. We found 19 SNP-SNP epistatic interactions with p < 7.3 × 10 −4 , indicating significant interactions (Supplementary Table S3). We found 10 SNP-SNP pairs that showed a significantly strengthened interaction (OR>1) while nine SNP-SNP pairs showed impaired interaction (OR<1). In addition, enhanced risk epistasis was also identified across HLA alleles (rs4947332 in HLA-DRB1 and rs9275376 in HLA-DQB1) between HLA and miRNAs (HLA-DRB1 and MIR3928), and between TNFAIP3 and MIR5695 ( Table 4). A significant positive interaction between rs4947332 (HLA-DRB1) and rs5997893 (MIR3928) with a significantly inflated OR 2.83 (95%CI: 1.75-4.58, p 1.36 × 10 −5 , Table 4) for double risk allele carriers, indicating the importance of HLA and non-HLA genetic variation interaction in RA susceptibility.

Cumulative Analysis Revealed Increased Risk Effect on Rheumatoid Arthritis
To estimate the combined effect of the 6 risk alleles (rs1414273T, rs4947332T, rs9268839G, rs9275376T, rs7752903G, and rs2620381A) on RA risk, the individual accumulation of risk alleles was treated as an ordinal variable in a logistic regression analysis adjusted by BMI, age and gender. As anticipated, we found that the RA risk showed a positive correlation with the cumulative number of risk alleles (OR 1.4, p 2.0 × 10 −16 , Z 12.54, SE 0.027,

DISCUSSION
In this study, 225 East-Asian common SNPs located in human microRNA seed regions were genotyped and association and   matched controls. The study identified 6 novel RA-associated miRNA SNPs (rs1414273 in miR-548ac, rs2620381 in miR-627, rs4285314 in miR-3135b, rs28477407 in miR-4308; rs5997893 in miR-3928 and rs45596840 in miR-4482) and revealed the interaction between HLA alleles and miRNA SNPs, thereby advancing understanding of RA pathogenesis.
We found that rs45596840 and rs2620381 located in seed regions of miR-4482 and miR-627, respectively, are significantly associated with RA. rs9268839-G (HLA-DRB9) was shown as a risk allele with OR 2.47 in Caucasian populations and we found that the OR is 1.72, indicating a consistent risk effect that needs to be validated in African populations. According to the miRNA-mRNA binding imputation, rs45596840 (miR-4482) and rs2620381 (miR-627) will affect more than 5,894 (2,132 gain and 3,517 loss, see URL) and 4,845 (2,593 gain and 2,252 loss, see URL) mRNA-miRNA bindings respectively. These FIGURE 2 | Genomic position and functional annotation to the significant SNPs in miRNAs. rs1414273 (C/T) in miR-548ac; rs2620381 (A/C) in miR-627-5p which cause 2,056 target gain and 1,440 target loss. rs4285314 (G/A) in miR-3135b; rs28477407(C/T) in miR-4308; rs5997893 (A/G) in miR-3928-5p and rs45596840 (G/A) in miR-4482-5p which will cause 1,677 target gain and 1,456 target loss. Target gain or loss were imputed by miRNA-SNP database (Gong et al., 2015) (see method). Five significant epistasis effects were selected to show the increased or decreased risk effect. rs4947332 located in C2 upstream (HLA-DRB1); rs9275376 located in HLA-DQB1 upstream; rs5997893 located in miR-3928; rs7752903 located downstream of TNFAIP3 and rs2967897 located in miR-5695.
Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 747274 miRNA target changes might alter the immune response significantly. In addition, the common regulation network between miR-4707 and miR-627 with CD244, CAMTA1 might be of interest. rs1414273 is located within the miR-548ac stem-loop sequencing in the first intron of the CD58 gene, which has shown a strong linkage disequilibrium with the Multiple Sclerosis (MS)-associated haplotype (Hecker et al., 2015). Hecker et al. found that SNP rs1414273 might alter Drosha cleavage activity to provoke partial uncoupling of CD58 gene expression and microRNA-548ac production from the shared primary transcript in immune cells and regulate the inflammatory processes and the balance of protein folding and degradation (Hecker et al., 2019). However, CD58-CD2 interactions recruiting lymphocytes to inflammatory sites play a crucial role in autoimmune disease (Hoffmann et al., 1996;Sable et al., 2016). Finally, in our previous research, we found that DNA methylation of CD4 + cells in rheumatoid arthritis showed abnormal DNA methylation in HLA regions (Guo et al., 2017). The current study found that interactions between HLA-miRNA contributed to RA susceptibility. Overall, the interaction between regulation/epigenetics (Guo et al., 2017) and genetic variation in RA is a promising field for future research.
There are limitations to this research. Due to the scope of the study, only a limited number of ancestry-informative SNPs were used to control for confounding by population stratification. To validate the RA phenotyping, we used 4 East-Asian GWAS RA-associated variants as positive controls for RA samples. All the enrolled individuals were from a three-generation Han Chinese family without genetic admixture with other non-Han ethnic individuals to avoid population stratification. We evaluated the power with Harvard Power Estimation Tool to handle multiple test correction issues with the following parameters: A 0.05, p 0.01, OR (Aa) 1.5, OR (AA) 3, N 1,600, RCC 1, p 2.2 × 10 −4 . We found that the power for dominant, recessive, and additive models are 0.59, 0.55, and 0.66 respectively. To receive a power of 0.8, the sample size was required to be larger than 2,100, 2,200, and 1,900. Since we undertook an extra metaanalysis where 4,873 RA cases and 17,642 normal are included, we believe this power is enough to make a corresponding conclusion. This study did not provide functional validation of these miRNAs, which is important to show biological validation of the miRNA findings and understand the mechanisms by which these miRNAs are involved in RA susceptibility. Subsequent studies should focus on the functional exploration of these miRNAs. Overall, we identified 6 miRNA SNPs that are significantly associated with the Han Chinese RA population. Two of these SNPs are located in the seed region of their miRNAs and are expected to cause target gain/loss, while another three SNPs are located in the loop/pre-miRNA regions which may influence the stability of corresponding miRNAs.

CONCLUSION
Genome-wide association studies have identified several hundred genetic risk factors for rheumatoid arthritis. However, the reported genetic variants only explain less than 40% heritability of rheumatoid arthritis. Hence, most of the heritability for rheumatoid arthritis liability remains missing. Furthermore, heterogeneity in rheumatoid arthritis susceptibility across populations is poorly understood. Interrogation of the genetic architecture of rheumatoid arthritis within non-European-derived populations may provide additional insight into the molecular etiology of the disease, revealing additional pathways that may contribute to the pathogenesis. In this study, we investigated the association between all known common SNPs within East Asian populations located in miRNAs and subsequently tested for the association of these SNPs with rheumatoid arthritis within a well-characterized sample set of Han Chinese origin (1,625 RA and 1,598 controls). We found two significant miRNA SNPs (rs1414273 in miR-548ac and rs2620381 in miR-627) significantly associated with RA in our dataset. A metaanalysis with previous GWAS studies, including 4,873 RA cases and 17,642 controls, discovered another 4 novel miRNA RA-associated SNPs (rs4285314 in miR-3135b, rs28477407 in miR-4308, rs5997893 in miR-3928, and rs45596840 in miR-4482). In addition, we identified numerous HLA-HLA and HLA-miRNA epistatic effects, indicating the importance of interactions between genetic variants in HLA and miRNA systems in susceptibility to RA. We found that individuals who carried 8 risk alleles were estimated to be at 15.38-fold increased risk of RA. Finally, we found that the targets of the significant miRNAs showed enrichment in immune related genes and FDA approved drug target genes, indicating miRNAs might be interesting targets to accelerate understanding of the pathogenesis and drug development for rheumatoid arthritis. Our study suggested that the interaction between HLA and functional SNPs in human regulatory elements may be a fruitful avenue of investigation in further understanding susceptibility to RA. 1.00 × 10 −06 a individuals carrying 0 or more than 9 risk alleles were not found in our dataset. We only found 8 individuals carrying 9 risk alleles (5 RA and 3 normal), which is not stable for the estimation and therefore ignored in this table. rs1414273 (T), rs4947332 (T), rs9268839 (G), rs9275376 (T), rs7752903 (G) and rs2620381 (A) were included in the analysis. Risk alleles are shown in the parentheses following SNPs.
Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 747274 8 FIGURE 3 | miRNA target analysis and target distribution in GWAS significant genes, immune genes, and FDA drug target genes. (A). Network connection between the top 6 miRNA predicted regulatory targets and previous GWAS identified 123 RA associated genes. (B). Network connection between the top 6 miRNA predicted regulatory targets and 4,723 InnateDB collected immune genes. (C) Network connection between the top 6 miRNA predicted regulatory targets and 672 FDA approved drug target genes. miRNA was labeled as diamond filled with green color and gene were as a circle filled with yellow color.
Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 747274 9 DATA AVAILABILITY STATEMENT The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Shanghai Guanghua Hospital of Integrated Traditional Chinese and Western Medicine. The patients/participants provided their written informed consent to participate in this study.