A Comparison of Co-methylation Relationships Between Rheumatoid Arthritis and Parkinson's Disease

Rheumatoid arthritis (RA) is a complex autoimmune disease. Recent studies have identified the DNA methylation loci associated with RA and found that DNA methylation was a potential mediator of genetic risk. Parkinson's disease (PD) is a common neurodegenerative disease. Several studies have indicated that DNA methylation levels are linked to PD, and genes related to the immune system are significantly enriched in PD-related methylation modules. Although recent studies have provided profound insights into the DNA methylation of both RA and PD, no shared co-methylation relationships have been identified to date. Therefore, we sought to identify shared co-methylation relationships linked to RA and PD. Here, we calculated the Pearson's correlation coefficient (PCC) of 225,239,700 gene pairs and determined the differences and similarities between the two diseases. The global co-methylation change between in PD cases and controls was larger than that between RA cases and controls. We found 337 gene pairs with large changes that were shared between RA and PD. This co-methylation relationship study represents a new area of study for both RA and PD and provides new ideas for further study of the shared biological mechanisms of RA and PD.


INTRODUCTION
Rheumatoid arthritis (RA) is a chronic autoimmune disease (Adkar et al., 2017;Nair et al., 2017;Choudhary et al., 2018;Liu and Page, 2018;Mizoguchi et al., 2018) with multiple environmental risk factors, including lifestyle factors, hormones, infections, and smoking, as well as interactions between genes and the environment (Klareskog et al., 2009;Scott et al., 2010;Karlson and Deane, 2012;Aleyd et al., 2016;Choudhary et al., 2018;Soulaidopoulos et al., 2018). Parkinson's disease (PD) (Antony et al., 2013;Goldman, 2014), like Alzheimer's disease (AD) (Zhou et al., 2018a), is a common neurodegenerative disease. The most important pathological change of PD is the degeneration of dopamine (DA) neurons in the midbrain, which causes a significant decrease in the striatum DA content and results in disease (Antony et al., 2013). The exact cause of this pathological change is still unclear. Genetic factors, environmental factors, aging, and oxidative stress may be involved in the degenerative death of PD dopaminergic neurons (Goldman, 2014).
In recent years, many studies have paid close attention to epigenetic mechanisms (Cui et al., 2018), especially DNA methylation (Liu et al., 2013;Davegårdh et al., 2018;Meehan et al., 2018). Epigenomic differences can represent phenotypic differences resulting from environmental exposure (Viatte et al., 2013). Increasingly, DNA methylation has been investigated as a potential diagnostic biomarker (Nair et al., 2017). DNA methylation occurs when a methyl group is added to cytosine DNA nucleotides to form 5-methylcytosine. This process is catalyzed by methyltransferases (Nair et al., 2017). A large number of studies have shown that DNA methylation can cause changes in chromatin structure, DNA conformation, DNA stability and DNA-protein interactions, to control gene expression (Hannon et al., 2016;Rao et al., 2018).
In 2013, Liu et al. analyzed the genome-wide DNA methylation differences in peripheral blood leukocytes (PBLs) from RA patients and normal controls (Liu et al., 2013). They applied the Causal Inference Test, with genotype as a causal factor, DNA methylation as a potential mediator and RA as the outcome, to identify the differential methylation positions (DMPs) associated with the RA phenotype. They found 264 unique single nucleotide polymorphisms (SNPs) and nine unique DMPs, comprising 535 SNP-DMP pairs, within the major histocompatibility complex region. Those SNP-DMP pairs represent potential methylation-mediated relationships between SNPs and RA disease risk, revealing that DNA methylation is a potential mediator of genetic risk. Many studies have indicated there are strong associations between neurodegenerative diseases and autoimmune diseases (Zhou et al., 2018b). In 2017, Chuang et al. found that PD status had a profound association with DNA methylation levels in blood and saliva, and three out of six PDrelated CpG clusters in blood were significantly enriched for genes related to the immune system (Chuang et al., 2017).
Although recent studies have provided profound insights into the DNA methylation of RA and PD, they have not considered the interaction relationships between genes. RA and PD, as complex diseases, may be caused by the interaction of multiple genes (Yi et al., 2017). Based on this concept, in this study we analyzed one-to-one gene co-methylation relationships to further dissect common molecular mechanisms between RA and PD.

DNA Methylation Data for RA
In this study, we obtained RA methylation data (GSE42861) from the NCBI GEO database. The platform was GPL13534 (Illumina HumanMethylation450 BeadChip, including 485,577 probes). GSE42861 contains 354 RA samples and 335 normal control samples. Bisulphite converted DNA was from PBLs. The matrix file of GSE42861 includes the information for samples and normalized beta values (from 0 to 1) for each sample.

DNA Methylation Data for PD
We also obtained PD methylation data (GSE111629) from the NCBI GEO database. The platform was also GPL13534. The matrix file includes 572 samples (335 PD samples and 237 normal control samples from whole-blood DNA) and normalized beta values (from 0 to 1) for each sample.

Mapping of Methylation Positions to Genes
In this study, we analyzed genes on a one-to-one basis to compare the global co-methylation landscapes. We used the same data processing for RA and PD. We mapped 485,577 probes in the GPL13534 platform to 21,225 genes. If there were multiple probes in a gene, we calculated the average beta value of those probes as the methylation value of the gene. The final RA dataset used for subsequent analysis consisted of 21,225 genes and 689 samples (including 354 RA samples and 335 normal control samples), and the final PD dataset included 21,225 genes, 335 PD samples, and 237 normal control samples.

The Co-methylation Coefficient
We used the Pearson's correlation coefficients (PCC) to measure the co-methylation relationships between gene pairs. We defined the PCC as: where r(i,j) is the PCC between gene i and gene j; n is the number of samples; b i,k is the methylation beta value of gene i in sample k; b i,k is the methylation beta value of gene j in sample k; b i is the mean of the methylation beta value of gene i; b j is the mean of the methylation beta value of gene j; S i is the standard deviation (SD) of the methylation beta value of gene i; and S j is the SD of the methylation beta value of gene j. When r(i,j) >0, gene i and gene j represent a positive correlation, that is, gene i and gene j are highly co-methylated. When the r(i,j) <0, gene i and gene j represent a negative correlation. In other words, the methylation levels of gene i and gene j are exactly opposite.

Comparison of the Distribution of the PCC Between RA and PD
We calculated the PCC of 225,239,700 gene pairs for RA cases and RA controls. The mean PCC of the RA cases was 0.174 and the SD was 0.250, while the mean PCC of the RA controls was 0.132 and the SD was 0.260. We then analyzed their distribution and plotted them on a graph ( Figure 1A). Both the RA cases and controls showed a similar unimodal distribution of the global PCC.
We also calculated the PCC for the dataset of PD cases and controls. The mean PCC of the PD cases was 0.334 and the SD was 0.209, while the mean PCC of the PD controls was 0.414 and the SD was 0.194. Their distribution is shown in Figure 1B, where the global change of PD cases and controls is larger than that of RA cases and controls.

Identifying Large Change Gene Pairs for RA
We divided the PCC into eight intervals, named strongest negative correlation (−1 to −0.75), strong negative correlation (−0.75 to −0.5), weak negative correlation (−0.5 to −0.25), weakest negative correlation (−0.25 to 0), weakest positive correlation (0 to 0.25), weak positive correlation (0.25 to 0.5), strong positive correlation (0.5 to 0.75), and strongest positive correlation (0.75 to 1). For each interval of the RA controls, we analyzed the changes in the co-methylation relationship from controls to cases. The results are presented in Table 1.
For RA, there were five gene pairs (ARHGAP17 and TSHZ3, DDX54 and TSHZ3, GATA3 and GRB2, GRB2 and TSHZ3, and PIGC and SHZ3) where the PCC changed from a strong negative correlation to a weak positive correlation, and 13 gene pairs where the PCC changed from a weak negative correlation to a strong positive correlation. These 18 gene pairs crossed four intervals and recorded significant changes. There was also a total of 1,893 gene pairs crossing no less than three intervals with large changes of the co-methylation relationship from controls to cases. These are shown in Supplementary File S1.

Identifying Large Change Gene Pairs for PD
For PD, we used the same method to divide the PCC into eight intervals, and changes of the co-methylation relationship from controls to cases are shown in Table 2.
From the PD controls to the PD cases, 15 gene pairs recorded the largest changes of the co-methylation relationship, crossing five intervals. These included two gene pairs (HIST1H2BH and NFKBIE, and HIST1H3F and NFKBIE) where the PCC changed from the strongest positive correlation to a weak negative correlation, and 13 gene pairs where the PCC changed from a strong positive correlation to a strong negative correlation.   Furthermore, 2,468 gene pairs crossed four intervals, including one gene pair from a weak negative correlation to a strong positive correlation, six gene pairs from the weakest positive correlation to the strongest negative correlation, 594 gene pairs from a weak positive correlation to a strong negative correlation, 1,807 gene pairs from a strong positive correlation to a weak negative correlation, and 60 gene pairs from the strongest positive correlation to the weakest negative correlation.
In total, 196,311 gene pairs crossed no less than three intervals in PD, shown in Supplementary File S2. There was obviously a larger difference in the partial co-methylation relationship between controls and cases in PD than in RA.

Shared Gene Pairs Between RA and PD
To identify the genetic mechanisms shared in both RA and PD, we extracted the gene pairs that changed across no less than three intervals. We extracted 1,893 gene pairs from RA and 196,311 gene pairs from PD. We found there were 337 gene pairs (in Supplementary File S3) shared between RA and PD. We extracted 28 gene pairs with an absolute value of difference >0.8 from RA and 4,748 gene pairs from PD. Nine of these gene pairs (shown in Table 3) were shared in RA and PD, and these nine gene pairs were included in the 337 gene pairs extracted above.

Gene Ontology and KEGG Pathway Analysis of Shared Gene Pairs
To further analyze the functional characteristics of the shared gene pairs, we extracted 220 unique genes from the 337 gene pairs by removing duplicate genes, and then annotated them to GO (Gene Ontology) categories and KEGG pathways using DAVID. This found 70 GO categories and four KEGG In addition, these genes were also annotated into two immune-related KEGG pathways (hsa04662: B cell receptor signaling pathway and hsa05340: primary immunodeficiency). As discussed above, 220 unique genes from the 337 gene pairs play important roles in autoimmune regulation and brain development.

DISCUSSION
In summary, we calculated the PCC of each gene pair and compared the distribution of the global PCC between cases and controls. We found a similar distribution between cases and controls. Nevertheless, we still found 1,893 gene pairs in RA and 196,311 gene pairs in PD whose co-methylation relationship changed a great deal. Moreover, there were 337 gene pairs that changed across no less than three intervals, and nine gene pairs with an absolute value of difference >0.8, shared between RA and PD. By Gene Ontology and KEGG pathway analysis, we found 220 unique genes from the 337 gene pairs that play important roles in autoimmune regulation and brain development. There are therefore potential pathogenic and genetic mechanisms shared between RA and PD. In our study, to assess the reliability of the results and the repeatability of the co-methylation relationships, and to determine how many samples are required for a repeat experiment, we randomly selected five groups of genes with 100 genes in each gene group without returning to the samples. We randomly selected samples from each group (the sample size ranged from 2 to 150) and calculated the PCC of each gene pair.
To examine the repeatability, we measured the similarity between replicates of the same operator by selecting the same number of samples from the remaining samples and recalculating the PCC. For each of the five groups, the PCC between the first experiment and the second replicate is shown in Figures 2A (RA cases) and 2B (RA controls). In both RA cases and controls, we found that the PCC increased with increasing sample size. Most of the PCCs were >0.9 when the sample size was >150. In other words, the PCC is repeatable when the sample size is relatively large.