Novel Insights Into Rheumatoid Arthritis Through Characterization of Concordant Changes in DNA Methylation and Gene Expression in Synovial Biopsies of Patients With Differing Numbers of Swollen Joints

In this study, we sought to characterize synovial tissue obtained from individuals with arthralgia and disease-specific auto-antibodies and patients with established rheumatoid arthritis (RA), by applying an integrative multi-omics approach where we investigated differences at the level of DNA methylation and gene expression in relation to disease pathogenesis. We performed concurrent whole-genome bisulphite sequencing and RNA-Sequencing on synovial tissue obtained from the knee and ankle from 4 auto-antibody positive arthralgia patients and thirteen RA patients. Through multi-omics factor analysis we observed that the latent factor explaining the variance in gene expression and DNA methylation was associated with Swollen Joint Count 66 (SJC66), with patients with SJC66 of 9 or more displaying separation from the rest. Interrogating these observed differences revealed activation of the immune response as well as dysregulation of cell adhesion pathways at the level of both DNA methylation and gene expression. We observed differences for 59 genes in particular at the level of both transcript expression and DNA methylation. Our results highlight the utility of genome-wide multi-omics profiling of synovial samples for improved understanding of changes associated with disease spread in arthralgia and RA patients, and point to novel candidate targets for the treatment of the disease.


INTRODUCTION
Rheumatoid arthritis (RA) is a complex, multifactorial, and chronic autoimmune disease that primarily affects the synovial tissue in joints (1). It affects about 1% of the population and manifests with significant unmet medical need (2). Investigating pathogenesis during different stages of disease, or across a spectrum of disease severity, is critical to optimize appropriate therapeutic interventions that affect disease progression (3,4).
Diagnosis of established RA coincides with the development of painful and swollen joints, although circulating autoantibodies can be detected up to 10 years before diagnosis (5,6). Clinically manifested joint swelling reflects synovial tissue inflammation (synovitis), which is characterized by infiltration into the synovium of multiple immune cell types [including T cells, B cells, macrophages, plasma cells, dendritic cells, natural killer (NK) cells, and mast cells (7)], with up to 18 distinct infiltrating cell populations being reported on in a recent single cell transcriptomics analysis (8). As disease progresses, synovial fibroblasts adopt an increasingly aggressive and invasive phenotype, promoting further inflammation and joint damage together with other processes induced by the inflammatory environment, such as the differentiation of bone-resorbing osteoclasts (9,10). Disease progression in early RA is often associated with the involvement of an increasing number of inflamed joints, but the mechanisms responsible for this spread of disease are poorly understood. Moreover, differences in the rate of disease manifestation and variability of response to therapy indicate that different pathophysiological mechanisms are implicated in disease development and progression compared to disease etiology (11).
An increasing body of evidence indicates that epigenetic modifications play an important role in the regulation of RA pathogenesis (12). Several array-based studies have reported widespread differences in DNA methylation among peripheral blood cells from RA patients and controls, suggesting that epigenetic modifications in circulating cells associate with disease (13,14). However, wider conclusions may be limited by the unknown correlation of these effects to synovial cells directly at the site of inflammation. Epigenetic modifications have also been implicated in modulating the function of synovial fibroblasts in RA, through comparisons of DNA methylation patterns of cultured cells isolated from RA and osteoarthritis patients (15)(16)(17)(18)(19)(20). Such studies have identified DNA methylation patterns that distinguish RA from other forms of arthritis, along with regulatory elements and biomarkers related to the pathological phenotype of RA. However, to identify novel candidate genes for therapeutic interventions that affect disease progression, it is important to study samples from patients across different stages of disease. Two such studies using cultured fibroblast-like synoviocytes found small but statistically significant patterns of hypomethylation in patients with longstanding RA compared to those with early RA, suggesting that the DNA methylome could be associated with the transformation of synovial fibroblasts into invasive cells capable of joint destruction and the resulting disease progression (21,22).
Here, we gain insights into RA heterogeneity by combining whole-genome DNA methylome and transcriptome analyses from the same synovial biopsies and stratifying patients according to the number of swollen joints, a clinical parameter reflective of disease evolution. We find that swollen joint count based on 66 joints (SJC66), which reflects the amount and spread of inflamed synovial tissue, is associated with major changes in gene transcription and DNA methylation at promoters. By crossinterrogating differentially methylated genomic regions and their associated genes, we reveal novel candidate loci associated with the spread of the disease across joints.

Sample Description
Synovial tissue was collected via a mini-arthroscopic procedure as described previously (23) from patients at the Amsterdam University Medical Center, University of Amsterdam. A total of 17 samples were obtained from three cohorts. The first cohort contained individuals that had either arthralgia and/or a positive family history for RA, but without arthritis (as determined by a clinician), and that were positive for IgM Rheumatoid Factor (IgM-RF) and/or Anti-Citrullinated Protein Antibodies (ACPA) (Pre-synoviomics; n = 4) (24). The second cohort consisted of individuals that at inclusion were Disease Modifying Anti-Rheumatic Drug (DMARD)-naïve with early arthritis, as defined by a disease duration of less than 1 year (Synoviomics; n = 9) (25,26). The third cohort contained samples from patients with established RA on active treatment with a disease duration of more than one year who had at least one swollen joint suitable for synovial tissue sampling (Synoviomics II; n = 4) (27). For all analyses, samples were prepared simultaneously to mitigate batch effects.
All subjects provided written informed consent and the collection and use of the samples received Institutional Review Board review and approval. Characteristics of patients included in this study are listed in Table S1.

RNA-Sequencing and Whole Genome Bisulphite Sequencing
Flash-frozen synovial tissue biopsies were utilized to simultaneously isolate RNA and DNA using an AllPrep DNA/ RNA Mini kit (Qiagen), with QIAshredder spin columns (Qiagen) used to disrupt the tissue. RNA samples were quantified and their integrity assessed using Qubit RNA Broad Range Assay Kit (Thermo Fisher Scientific) and an Agilent 2100 Bioanalyzer RNA 6000 Nano Kit (Agilent Technologies), respectively. Depending on sample yield, DNA samples were quantified using Qubit DNA BR or Qubit DNA HS kits (Thermo Fisher Scientific).
RNA-Seq libraries were generated from 150 ng of total RNA. The TruSeq ® Stranded Total RNA LT was used with a Ribo-Zero ™ Human/Mouse/Rat kit (Illumina), following the 'Low Sample' protocol except for two modifications. Firstly, the time of the 'Elution 2 -Frag -Prime' program was reduced from 8 to 6 min to increase the length of the RNA fragments. Secondly, 11 instead of 15 cycles were used to enrich the DNA fragments. Libraries were quantified with a KAPA Library Quantification Kit (KAPA Biosystems) on a QuantStudio 12 K Flex Real-Time PCR System (Thermo Fisher Scientific). Five samples were multiplexed per lane and libraries were clustered and sequenced using HiSeq ® PE Cluster Kit v3 -cBot ™ and HiSeq ® SBS Kit v3 kits (Illumina). Paired-end sequencing (2 x 76 bp) was performed using a HiSeq 1500 (Illumina) to a depth of~40 M reads per sample.
Whole Genome Bisulphite Sequencing (WGBS) libraries were generated using an EpiGnome Methyl-Seq Library Preparation Kit (Epicentre, now Illumina) from 100 ng of sample DNA. Bisulphite conversion was performed using an EZ DNA Methylation-Lightning Kit (Zymo Research). Each bisulphite conversion reaction contained 500 pg of unmethylated lambda DNA (Promega), which was used as a control to verify that bisulphite conversion efficiencies were at least 98%. Libraries were quantified using a KAPA Library Quantification Kit (KAPA Biosystems) on a 7900HT Real-Time PCR System (Thermo Fisher Scientific). Six samples were multiplexed per lane and libraries were clustered and sequenced using HiSeq ® PE Cluster Kit v4 -cBot ™ and HiSeq ® SBS Kit v4 kits (Illumina). Pairedend sequencing (2 x 125 bp) was performed using a HiSeq 1500 (Illumina) to a depth of~600M reads per sample. To provide sufficient coverage each batch was sequenced over 2 high output runs.

Exploratory Analyses of DNA Methylation and Gene Expression
To concurrently explore the DNA methylation and gene expression data, Multi-Omics Factor Analysis (MOFA) was applied (v1.0.0) (28). In short, MOFA performs unsupervised dimensionality reduction simultaneously across multiple data modalities from the same sample through a small number of inferred latent factors, enabling the detection of co-ordinated changes between the different data modalities (28). Here, we used the 5,000 most variable CpGs and genes as input and with the number of latent factors set to 9, the tolerance to 0.1 and the factor threshold to 0.02.
All subsequent analyses were performed in R (v3.5.0) (33). Gene-level counts were generated from the transcript abundances using tximport (v1.12.0) (34). Allosome-associated genes were removed to mitigate obvious sex effects. Differential Gene Expression (DGE) analyses were conducted using DESeq2 (v1.22.2) (35) where SJC66 high was compared with SJC66 low whilst correcting for sex, age and DMARD usage using the following design formula: Gene Expression ∼ sex + age + DMARD usage + SJC66 dichotomized DMARD usage was a binary variable defined by the usage of any medication: conventional DMARDs (cDMARD, including methotrexate) or biological DMARDs (bDMARD). As Kallisto provided abundance levels for individual transcripts, Gene Differential Expression (GDE) analyses were also conducted to identify genes where particular transcripts were differentially expressed (36). In short, differential expression analyses was performed using DESeq2 and the resulting p-values were combined using the Lancaster aggregation method found in aggregation (v1.0.1) (36), where observations were weighted by the base expression. DGEs and GDEs were defined as genes with a false discovery rate (FDR)-adjusted p-value less than 0.05.

DNA Methylation Data Analysis
Quality assessment of the raw reads was performed using FastQC (v0.11.2) (29). Adapter and quality trimming was performed using Skewer (v0.1.123) (37) and a quality filter of 20. To assess bisulphite conversion rates, Bismark (v0.14.1) (38) was used to align the reads to the genome of the phage lambda, and again for alignment to the GRCh38 build of the human genome. Postalignment filtering of unmapped reads, reads aligning at multiple locations and reads with a mapping score lower than 10 was carried out using SAMtools (39).
All subsequent analyses were performed in R (v3.5.0). CpG loci located on the allosomes were removed to mitigate the sex effect. The differential methylation analyses were performed using dmrseq (v1.2.5) (40), where we contrasted SJC66 high with SJC66 low whilst correcting for sex, age and DMARD usage using the following design formula: Differentially Methylated Regions (DMRs) were annotated using ChipPeakAnno (v3.16.1) (41) to genes if the DMR was located within 5,000 bp upstream or 1,000 bp downstream of the gene as obtained from Gencode (v29) (42).

Integrated DNA Methylation and Gene Expression Analysis
Integrated analyses were based on the DMRs and GDEs found through the separate DNA methylation and gene expression analyses. The overlap between DMRs and GDEs were called Genes displaying both Differential Expression and Methylation (GDEMs). For each GDEM the median percentage methylation was calculated for all constituent CpGs per sample and correlated with the log 2 transformed expression counts to obtain the Pearson correlation coefficient. Confidence intervals (95%) were calculated through 1000 bootstraps for each GDEM. In short, 17 samples were randomly drawn from the original samples with replacement, whereupon the Pearson correlation coefficient was calculated. This process was repeated 1000 times to generate the empirical distribution function, which was then used to estimate the confidence intervals (43). The aforementioned bootstrapping approach was performed using the boot (v1.3) package (44). For inferential purposes, p-values were calculated by means of a permutation approach. In short, per GDEM, 1000 sets of consecutive CpGs equal to the length of the observed DMR were sampled and correlated with the gene expression signal as described above, after which the proportion of correlation coefficients higher than the observed correlation coefficient was calculated yielding the p-value.

Functional Enrichment, Cell Type Enrichment and Protein-Protein Interaction Network Analyses
Gene set enrichment analyses were performed using fgsea (v1.8.0) (45) using the Metabase pathways terms (46) as reference. Metabase pathway terms with an FDR-adjusted p-value less than 0.05 were considered significant.
Cell proportions were imputed using xCell (v1.1.0) (47), where transcripts per million were used to estimate the proportion of each of the 64 immune and stromal cell types. Subsequent linear regressions were performed to calculate the p-values to assess statistical significance. Again, we compared SJC66 high with SJC66 low also correcting for age, sex and DMARD usage.
Protein-protein interaction (PPI) network analyses were performed using the STRING (v11) database (48), in order to identify whether a set of genes was over-represented for interactions. In short, the PPI analysis returned networks of genes where the encoded proteins interacted, co-expressed or coevolved with one another, based on text mining, curated databases, and experimental data.

Data Availability
The datasets generated and analyzed for this study can be found in the ArrayExpress repository under accession number E-MTAB-6638 and E-MTAB-6684 for WGBS and RNA-Seq, respectively. All code is hosted on GitHub at https://github. com/enricoferrero/BTCURE.

Swollen Joint Count Is Associated With the Latent Factor Explaining Variance in Gene Expression and DNA Methylation
We profiled the DNA methylome and transcriptome of 17 synovial tissue samples (Table S1) using whole genome bisulphite sequencing (WGBS) and RNA-sequencing (RNA-Seq). We initially attempted to link gene expression changes to disease duration as well as to cross-patient variations in the Disease Activity Score of 28 joints (DAS28) variables, but these analyses resulted in weak and non-biologically relevant signals, and were ultimately deemed inconclusive for this set of samples (data not shown). Principal Component Analysis (PCA) indicated that variation in both DNA methylation and gene expression were independently correlated with the swollen joint count in 66 joints (SJC66; Figures 1A, B). Comparison of the first principal component of both the DNA methylation and gene expression data suggested agreement with samples 33, 19, 3, 12A and 25 broadly separating from the other samples for both modalities ( Figure 1C). While sample 33 appeared to be an outlier based on the DNA methylation data the removal thereof did not alter the correlation substantially ( Figure S1). To further explore DNA methylation and gene expression at a genome-wide level in an integrative fashion, we performed variance decomposition using multi-omics factor analysis (MOFA) (28). MOFA infers a set of latent factors that capture sources of variability across different measured -omic modalities of the same samples. We found that most variance (approx. 70%) was better explained by gene expression as compared to DNA methylation ( Figure 1D). Further decomposition of the variance identified 8 latent factors, with LF1 explaining 40% of the variance in gene expression, whereas variance in methylation was more evenly distributed amongst all eight latent factors. Focusing on LF1, we observed a marked separation between samples with SJC66 of 9 and above compared to those with SJC66 of 8 or less ( Figures 1E and S2A, B). While most samples were obtained from the knee, two were obtained from the ankle. Interrogation of the first latent factor did not indicate any correlation with the source of the sample ( Figure S2C). Importantly, sex was also strongly associated with LF1 (p-value = 8.8E-03; Figure S2D) where samples with SJC66 of 9 and above were mostly males and most samples with SJC66 equal to or fewer than 8 were mostly females. Therefore, subsequent comparative analyses accounted for the imbalance in sex by removal of allosome-associated genes as well as including sex as covariate in linear models. females. Therefore, subsequent comparative analyses accounted for the imbalance in sex by removal of allosome-associated genes as well as including sex as covariate in linear models.

Differences in DNA Methylation and Gene Expression in Patients With High and Lower Numbers of Swollen Joints Are Associated With Immune Response and Cell Adhesion Pathways
Having observed genome-wide differences through exploratory analyses in both DNA methylation and gene expression data that associated with SJC66, we next investigated which regions and genes were differentially methylated and expressed. At this point we investigated DNA methylation and gene expression separately. While samples with SJC66 = 0 were included for the aforementioned exploratory analyses, they were excluded from subsequent comparative analyses as they were medically not a homogeneous group consisting of very early arthritis as well as late arthritis with no swelling following medication. The swollen samples were stratified according to the separation observed in the exploratory analyses ( Figure 1E) and we compared SJC66 high (SJC66 ≥ 9) with SJC66 low (SJC66 < 8) correcting for age, sex and DMARD usage. Comparative methylation analysis identified 3,536 DMRs ( Figure 2A and Table S2), where 2,140 were hypomethylated and 1,396 were hypermethylated. The majority of DMRs were located within 1 Kb of a TSS or in distal regions ( Figure 2B). Notably, the most statistically significant DMRs spanned regions larger than 10 Kb (Table S2).  Figure  2D) and exceeded 15 Kb in length. While the MIRLET7BHGassociated DMR was hypermethylated, the MIR10B-associated DMR was hypomethylated when comparing SJC66 high with SJC66 low . Functional analysis of the DMRs identified several pathways with evidence of differential methylation. Among the top 10, we observed that the NGF/TrkA MAPK pathway, immune response pathways including antigen presentation by MHC class II, macrophage and dendritic cell phenotype shifts, as well as CCR3 signalling in eosinophils, were hypermethylated ( Figure 2E and Table S3).
Comparing SJC66 high with SJC66 low at the gene expression level identified 142 DGEs, of which 106 up-regulated and 36 down-regulated ( Figure 3A and Table S4). The top 2 DGEs, chemokine ligand 13 (CCL13) ( Figure 3B) and C-Type Lectin Domain Containing 10A (CLEC10A) ( Figure 3C), were both found to be more highly expressed in the SJC66 high samples.  Functional analyses revealed a striking similarity with the differential methylation results, where genes associated with antigen presentation by MHC class II as well as the NGF/TrkA MAPK pathways were overexpressed ( Figure 3D and Table S5).

Transcripts Associated With Differentially Methylated and Expressed Genes Display Concordant Expression and Identify Nodal Points for Key Interactions
Having observed pathway-concordant differences in DNA methylation and gene expression, we were interested in identifying specific genes that were both differentially methylated and expressed. Although DNA methylation has traditionally been associated with gene-level expression, emerging evidence shows that it also regulates alternative splicing (49)(50)(51). Therefore, we complemented our Differential Gene Expression (DGE) analysis, which masks transcript-level dynamics, with Gene Differential Expression (GDE) analysis, which identifies genes that display transcript-level differences, by combining the p-values of individual transcripts associated with a single gene (36). We identified 290 genes that displayed perturbations in the expression of their transcripts (GDEs ;  Table S6). Combining the GDEs with the DMRs yielded 97 unique DMRs associated with 59 unique genes, which we termed Genes Differentially Expressed and Methylated (GDEMs) (Figures 4A, B). We observed that most transcripts, those A B D C FIGURE 3 | SJC66 high versus SJC66 low differential expression analysis. (A) Volcano plots depicting the -log 10 (p-value) on the y-axis relative to the difference in log 2 fold change when comparing the SJC66 high with SJC66 low samples on the x-axis. Colours represent the non-significant genes (orange), the significant genes (blue) and the significant genes with a log 2 fold change of larger than 1 (green; "sig.   Figure 4B). Investigation of the methylation status of the gene overlaid onto the transcript location indicated that the observed DMRs for KIAA1191, KLC1, CDC16, FASTK and GPR132 were located in the promoter shared by most transcripts ( Figure 4C). The mechanisms that could cause opposite direction of expression in these genes are currently unclear although studies in Arabidopsis have identified a methylation reader complex that can enhance rather than suppress gene transcription in the presence of methylation (52). Our work therefore supports the rationale for further validation and mechanistic studies. Functionally, we observed that the 59 GDEMs were primarily over-represented for immune response-associated pathways, specifically T-lymphocyte associated pathways ( Figure 5A). We interrogated which of the GDEMs encoded known interaction partners by querying the STRING database for documented interactions (48). Almost half (46%) of the GDEMs encoded for interacting proteins with the most connected GDEMs being ITGB2 (11 interactions) and LCP2 (9 interactions) ( Figure 5C). Both ITGB2 and LCP2 were differentially methylated at multiple locations, with the largest visible differences occurring within the TSS and downstream thereof ( Figures 5D, E). Transcript-wise, ITGB2-transcripts ENST00000498666 and ENST00000397852 as well as LCP2 transcript ENST00000046794 were most differentially expressed. Expression Quantitative Trait Methylation (eQTM) analyses confirmed strong inverse correlations between the differences in methylation in the promoter regions of ITGB2 (21:44918461-44921815) and LCP2 (5:170295513-170298924) with transcripts ENST00000498666 (r = -0.9; p-value = 1E-04) and ENST00000046794 (r = -0.9; p-value = 1E-04), respectively ( Table 1). While the association between the ITGB2 promoter DMR and ENST00000397852 expression was non-significant (p-value = 0.2353), the correlation  coefficient remained high (r = -0.87) ( Table S7). Nonetheless, given the centrality of ITGB2 and LCP2 among the GDEMs would make them interesting candidates in future targeted studies.

Estimated Cellular Composition Suggests Lower Proportion of Neuronal Cells in SJC66 high
Systematic differences in DNA methylation and gene expression could reflect changes in the cellular composition. To this end, we estimated the cellular composition using the transcriptome data as input for xCell, which is capable of estimating enrichment scores of 64 immune and stromal cell types (47). By comparing the estimated proportions from SJC66 high with SJC66 low we identified significant differences for 6 cell types: neurons, dendritic cells (DCs: all, conventional and immature), megakaryocytes and platelets ( Figure 6 and Table S8). Expectedly, higher enrichment scores for DCs (all subtypes) and platelets were estimated for the SJC66 high samples. By contrast, lower proportions of neuron and megakaryocyte signatures were observed for the SJC66 high samples. Notably, the difference in neuronal enrichment was found to be the most statistically significant with an almost fourfold difference when comparing SJC66 high with SJC66 low .

DISCUSSION
In this study, we highlight insights into RA progression by combining the outputs of parallel whole-genome DNA methylome and transcriptome analyses on extracted preparations of synovial biopsies, from auto-antibody positive individuals, early arthritis patients and patients with established RA, stratified by the number of swollen joints. We observed that synovia from patients with a higher number of swollen joints (SJC66 ≥ 9) were different at the level of DNA methylation and gene expression from synovia from patients with a lower number of swollen joints. Specifically comparing SJC66 high with SJC66 low revealed 3536 DMRs and 142 DGEs, with both datasets primarily enriched for pathways associated with immune responses. The most significant difference in methylation was found spanning the promoter regions of MIRLET7B and MIR10B. Interestingly, mouse miR-let-7b has been shown to provoke arthritic joint inflammation by remodeling naïve myeloid cells into M1 macrophages via TLR-7 ligation (53) and can augment disease severity (54). MIR10B has been shown to regulate Th17 cells in patients with ankylosing spondylitis (55) but no studies have specifically associated it with RA. At the level of transcription, CCL13 and CLEC10A were found to be the most differentially expressed. CCL13 (MCP-4) is an extensively studied chemokine that is thought to be involved with RA pathogenesis and disease progression (56)(57)(58). By contrast, not much is known about the role of CLEC10A in RA besides it being highly expressed on immature dendritic cells (DCs), monocyte-derived DCs and alternatively activated macrophages (59), as well as having been observed in the inflamed synovium of patients with active RA (60). Chemokines CCL13, CCL8, CXCL11, CXCL10, and CXCL9 regulate the recruitment of leukocytes into tissue and have therefore been implicated in the pathogenesis of RA (61). Differential methylation was observed in the vicinity of the promoter for CCL13, CXCL11, and CXCL9. Such results support a role for epigenetic/transcriptional processes in the spread of pathology to additional joints. While definitive mechanisms of joint spreading remain elusive, possible roles for immune cell migration due to chemokine expression (62,63) could be further evaluated based on our data. Altogether, we observed that 3% of DMRs associate with 20% of the differentially expressed GDEs. It is not surprising that not all DMRs could be linked to genes as a large number are found in distal intergenic (18.6%) or intronic (28.3%) regions, making any functional inference challenging. Of the genes that presented Expression quantitative trait methylation (eQTM) analysis of the GDEMs representing the correlation between methylation and transcript expression. Key in table legend: Gene = HGNC gene symbol, DMR = co-ordinates of the DMR (GRCh38), DMR p-value = p-value associated to the differential methylation analysis, EnsT = Ensembl transcript ID, DTE p-value = p-value associated to the differential transcript expression analysis, eQTM r = Pearson correlation coefficient for the methylation-expression correlation and the 95% confidence intervals, eQTM pvalue = p-value associated to the methylation-expression correlation. An extended parsable table including the full eQTM analysis for all GDEMs can be found in Table S6.
both differentially expression and methylation, protein-protein interaction networks indicated that half encoded for interacting proteins, suggesting that the observed GDEMs function together. The most interconnected GDEMs appeared to be ITGB2 and LCP2, with multiple regions of differential methylation observed surrounding both genes. While we observed transcript differential expression, most transcripts belonging to ITGB2 and LCP2 behaved similarly and all displayed reasonably strong inverse correlations with the DMRs located in the promoter area. ITGB2 encodes an integrin, which would typically be involved in cell-surface mediated signalling. We observed that the gene encoding ITGB2's interaction partner integrin alpha L (ITGAL) was also differentially methylated and expressed. ITGB2 and ITGAL together form lymphocyte function-associated antigen (LFA-1), which interacts with intracellular adhesion molecule 1 (ICAM-1 or CD54) resulting in an enhanced immune cell influx into the synovial tissue (64,65). Inhibiting LFA-1 has been reported to reduce inflammation and joint destruction in murine models of arthritis (66). Functionally, the 59 GDEMs were primarily over-represented for immune response-associated pathways, specifically Tlymphocyte associated ones. It would be fascinating to understand why the DNA methylome and transcriptome between patients with a SJC66 of 9 and above relative to patients with a SJC66 of 8 present with such a sudden split instead of a gradual difference. It is clear that inflammation is likely an important factor contributing to the observed differences as patients with a high SJC66 also generally present higher levels of inflammation as expressed through using erythrocyte sedimentation rate (ESR) or concentration C-reactive protein (CRP). Our transcriptomic data indeed suggested an increased proportion of immune cells among SJC66 high samples, as would be expected while pathology develops and cells migrate into the affected joints. Previous work has shown that clinically manifest arthritis in established RA is associated with increased infiltration of leukocytes. In synovial tissue samples from clinically involved joints, scores for infiltration by DCs are consistently higher than in clinically uninvolved joints obtained simultaneously from the same RA patients (67). Importantly, when comparing different clinically inflamed joints from the same RA patient simultaneously, leukocyte infiltration in one inflamed joint was shown to be representative of that in other inflamed joints, supporting the notion that leukocytes migrate from one joint to another (68). Indeed, there is continuous influx of leukocytes into the joints in established RA (69). We postulate that if synovial leukocytes exhibit properties that would facilitate cell migration, arthritis might spread from one inflamed joint to another. The results presented here support a disease mechanism in which, after development of clinically established RA, inflammatory and cell adhesion-associated processes play a key role in the progression of RA to greater joint involvement (70)(71)(72). Interestingly, we also observed differences of neuronal signatures suggesting a lower relative enrichment of neurons among SJC66 high samples. In addition, enrichment analyses on the DMRs suggested hypermethylation of genes encoding nociception receptors, which are typically associated with peripheral sensory neurons (73,74). A similar decrease in neuronal signature has previously been associated with RA severity, where the authors noted a potential role in the maladaptive response towards damage (75). This is consistent with a more general loss of anti-inflammatory control by the nervous system in RA (76). Important to note is that our observations are based on estimates made by xCell, which can only calculate enrichment scores based on signatures rather than absolute values of cells. We are therefore unable to discern whether a population increased in size or a different population had decreased. Ideally, a similar estimate would have been generated based on the DNA methylation data, but the currently available reference datasets do not include the cell types measured available in xCell.
There are two limitations of this study, namely the fact that sex confounds the separation between SJC66 high and SJC66 low and the limited sample size. While we have sought to mitigate the confounding effect of sex by removing genes and CpGs located on the allosomes as well as by including sex as a covariate in our analyses, we acknowledge that we cannot fully eliminate the possibility that a sex effect is present. Accordingly, validation studies would be necessary where the DMEGs are verified using a larger, independent cohort while controlling for an interaction effect between sex and SJC66. The observed differences in transcript expression could be validated using a quantitative PCR approach with primers designed specifically against particular transcripts. Similarly, for validating the DMRs, targeted bisulphite-sequencing using primers for the regions of interest would be a cost-effective approach.
In conclusion, our study constitutes an exploratory analysis of whole genome DNA methylation and gene expression data performed on primary synovial tissue material from autoantibody positive arthralgia patients without arthritis as well as patients with early and established RA patients. Where previous studies investigated cells from patients with RA versus disease controls and were potentially limited by their use of cultured cells, we focus on an integrative analysis of epigenetic marks and alternative splicing associated with swelling spread, providing novel insights into the mechanisms of disease progression towards more severe phenotypes. Nonetheless, further validation is necessary if the identified target genes are to be used for monitoring or treatment of the swelling and associated inflammatory processes in the joints of RA patients.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ebi.ac. uk/arrayexpress/, E-MTAB-6638; https://www.ebi.ac.uk/ arrayexpress/, E-MTAB-6684.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by AMC medisch ethisch toetsingscommissie under MEC 02/152, MEC 05/107, and MEC 07/253 for the Synoviomics, Pre-synoviomics, and Synoviomics II cohorts, respectively. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
EF, HL, DT, CL, MS, DG, PT, and RP conceived the study. KM and GR carried out the laboratory experimental work. EF and CL conceived the analytical design. EF and AL performed the data analysis. HL, DT, CL, MM, PH, and WJ helped supervise the secondary data analyses. EF, AL, KM, HL, and DT led the writing of the manuscript. All authors contributed to the article and approved the submitted version.

ACKNOWLEDGMENTS
We would like to thank the BTCure research consortium for providing access to the samples, Victor Neduva (Target and Pathway Validation, Target Sciences, GSK) for transferring the raw data files, Erika Cule (Research Statistics, GSK) for input into statistical methodology, and Yee Ying Chang (Clinical Genetics, Genome Diagnostics Laboratory, Amsterdam UMC, University of Amsterdam) for implementing the eQTM function in R. This manuscript has been released as a Pre-Print at https://papers. ssrn.com/sol3/papers.cfm?abstract_id=3576744 (77).