Integrating Genome-Wide Association Studies With Pathway Analysis and Gene Expression Analysis Highlights Novel Osteoarthritis Risk Pathways and Genes

Osteoarthritis (OA) is the most common degenerative joint disorder worldwide. To identify more genetic signals, genome-wide association study (GWAS) has been widely used and elucidated some OA susceptibility genes. However, these susceptibility genes could only explain only a small part of heritability of OA. It is suggested that the identification of disease-related pathways may contribute to understand the genomic etiology of OA. Here, we integrated the GWAS into pathway analysis to identify novel OA risk pathways. In this study, we first selected 187 independent genetic variants identified by GWAS (P < 1.00E−05) and found that most of these genetic variants are noncoding mutations. We then conducted an expression quantitative trait loci analysis and found that 165 of these 187 genetic variants could significantly regulate the expression of nearby genes. Third, we identified OA susceptibility genes corresponding to these genetic variants, conducted a pathway analysis, and identified novel OA-related KEGG pathways, GO biological processes, GO molecular functions, and GO cellular components. In KEGG database, transforming growth factor β signaling pathway is the most significant signal (P = 5.98E−05) and is the only pathway after the BH multiple-test adjustment with false discovery rate (FDR) = 0.02. In GO database, we identified 24 statistically significant GO biological processes, one statistically significant GO molecular function, and five statistically significant GO cellular components (FDR < 0.05). These signals are related with chondrocyte differentiation and development, which are all known biological pathways associated with OA. Finally, we conducted an OA case–control gene expression analysis to evaluate the differential expression of these OA risk genes. Using an OA case–control gene expression analysis, we showed that 44 risk genes were suggestively differentially expressed in OA cases compared with controls (P < 0.05). Three genes, WWP2, COG5, and MAPT, were statistically differentially expressed in OA cases compared with controls (P < 0.05/122 = 4.10E−04). Hence, our findings may contribute to understanding the genomic etiology of OA.


INTRODUCTION
Osteoarthritis (OA) is the most common degenerative joint disorder worldwide (Martel-Pelletier et al., 2016). Osteoarthritis could cause pain and disability in elderly people (Parmet et al., 2003;Zeggini et al., 2012). It is considered that OA is caused by the combination of risk factors with increasing age and obesity (Parmet et al., 2003;Martel-Pelletier et al., 2016). Osteoarthritis is a complex disease and has a strong genetic component (Zeggini et al., 2012). To identify more genetic signals, genome-wide association study (GWAS) has been widely used and elucidated some OA susceptibility genes (Zeggini et al., 2012;Styrkarsdottir et al., 2014;Styrkarsdottir et al., 2018;Zengini et al., 2018;Tachmazidou et al., 2019). However, these susceptibility genes could explain only a small part of heritability of OA (Zeggini et al., 2012;Styrkarsdottir et al., 2014;Styrkarsdottir et al., 2018;Zengini et al., 2018;Tachmazidou et al., 2019). It is suggested that the identification of disease-related pathways may contribute to understand the genomic etiology of OA (Cibrian Uhalte et al., 2017).
In recent years, kinds of bioinformatics software and databases have been developed, especially for the identification of disease-related pathways such as VEGAS (Versatile Genebased Association Study) (Liu et al., 2010), INRICH (INterval enRICHment analysis) (Lee et al., 2012a), and GSA-SNP2 (Yoon et al., 2018). Meanwhile, these methods have been widely used in OA-related disease rheumatoid arthritis (RA) (Eleftherohorinou et al., 2009;Eleftherohorinou et al., 2011;Lee et al., 2012b;Song et al., 2013;Zhang et al., 2016), but not in OA. Here, we first selected all significant OA risk variants identified by GWAS and evaluated their distribution in genome coding and noncoding regions. Second, we then conducted an expression quantitative trait loci (eQTLs) analysis to evaluate the effect of these genetic variants on gene expression. Third, we performed a pathway analysis of OA susceptibility genes detected by OA-related genetic pathways. Finally, we conducted an OA case-control gene expression analysis to evaluate the differential expression of these OA risk genes.

Selection of OA Risk Variants
We selected the potential OA risk variants identified by GWAS by searching for the NHGRI GWAS catalog database using the keyword "osteoarthritis" (Welter et al., 2014;MacArthur et al., 2017;Buniello et al., 2019). Finally, we selected 187 genetic variants associated with OA (P < 1.00E−05) (Welter et al., 2014;MacArthur et al., 2017;Buniello et al., 2019). These variants are associated with OA, OA of the hip, OA of the knee, or OA of the hand. Here, we provided all related information about these variants in Supplementary Table 1.

Identification of OA Risk Genes
For each of these 187 genetic variants, if this variant is an intronic variant, we select its corresponding gene. If this variant is an intergenic variant, we select its corresponding nearest upstream and downstream genes. The gene set consists of 202 risk genes using all these 187 genetic variants, as provided in Supplementary Table 1.

Function Analysis
We used the HaploReg tool (v4.1) to annotate these 187 genetic variants and evaluate how many genetic variants are coding and noncoding mutations (Ward and Kellis, 2012;Ward and Kellis, 2016). The reference information is from the 1000 Genomes Project (Ward and Kellis, 2012;Ward and Kellis, 2016).

eQTLs Analysis
When noncoding genetic variants are identified, we conducted an eQTLs analysis to evaluate the effect of these genetic variants on gene expression using Phenol Scanner (v2), a database of human genotype-phenotype associations (Staley et al., 2016;Kamat et al., 2019).

Identification of OA Risk Pathways
Here, we selected the pathway resources from KEGG and GO databases, as provided in Web-based Gene Set Analysis Toolkit (WebGestalt version 2019) (Wang et al., 2017). To identify a potential OA risk pathway, the hypergeometric test was used to evaluate an overrepresentation of the OA risk genes among all the genes in the selected pathway, as did in previous studies (Bao et al., 2015;Zhao et al., 2015;Jiang et al., 2017;Liu et al., 2017;Wang et al., 2017). For a given pathway, the P value of observing more than K OA risk genes could be calculated by where N is the total number of genes that are of interest (the reference gene list), S is the number of all OA risk genes (here, n = 202), m is the number of genes in the pathway, and K is the number of OA risk genes in the pathway, as did in previous studies (Liu et al., 2012;Liu et al., 2013;Liu et al., 2014;Quan et al., 2015;Shang et al., 2015;Xiang et al., 2015). Using WebGestalt version 2019, the minimum number of genes for a category is 5, and the maximum number of genes for a category is 2000. In addition, we used the BH multipletest adjustment method to adjust the P value of each pathway. A specific pathway with adjusted FDR < 0.05 is considered to be statistically significant. A specific pathway with unadjusted P < 0.05 is considered to be suggestively significant.

OA Case-Control Gene Expression Analysis
We conducted an OA case-control gene expression analysis to evaluate the differential expression of these 202 risk genes, as provided in Supplementary Table 1. The gene expression profile dataset is from peripheral blood mononuclear cells of 106 OA patients and 33 sex and age-matched healthy controls, which is a subset of the Genetics osteoarthritis and Progression study (Ramos et al., 2014). Here, we used the interactive web tool GEO2R to identify genes that are differentially expressed in OA patients and healthy controls. A gene with unadjusted P < 0.05 is considered to be suggestively differentially expressed. In addition, we used the Bonferroni correction method to adjust these P values and define the statistically differential expression, as described in a previous study (Krzywinski and Altman, 2014).

Function Analysis
Function analysis using HaploReg tool (v4.1) showed that eight genetic variants, rs12193102, rs35611929, rs375575359, rs528981060, rs532464664, rs541612392, rs547116051, and rs547181612, were not found in 1000 Genomes Phase 1 data. In the remaining 179 genetic variants, 11 genetic variants are coding mutations, and the other 168 genetic variants are noncoding mutations. More detailed information is provided in Supplementary Table 2.

Identification of OA-Related KEGG Pathways
Using these 202 OA risk genes, we identified 29 suggestively significant KEGG pathways (unadjusted P < 0.05). Here, we provide the top 20 significant pathways, as provided in Table 1. The transforming growth factor β (TGF-β) signaling pathway is the most significant signal (P = 5.98E−05). Importantly, this is the only pathway after the BH multiple-test adjustment with FDR = 0.02. In addition, we identified other OA risk pathways including inflammatory bowel disease (IBD), human T-cell leukemia virus 1 infection, RA, influenza A, asthma, tuberculosis, prostate cancer, hematopoietic cell lineage, transcriptional misregulation in cancer, allograft rejection, mitogen-activated protein kinase (MAPK) signaling pathway, T H 7 cell differentiation, graft-versushost disease, toxoplasmosis, type 1 diabetes mellitus, cell cycle, intestinal immune network for IgA production, autoimmune thyroid disease, and ubiquitin-mediated proteolysis. More detailed information is described in Table 1.

Identification of OA-Related GO Biological Processes
Using these 202 OA risk genes, we identified 24 statistically significant GO biological processes (FDR < 0.05). These biological processes could be mainly divided into two classes including differentiation and development. The differentiation-related biological processes consist of chondrocyte differentiation, regulation of chondrocyte differentiation, positive regulation of chondrocyte differentiation, regulation of epithelial cell proliferation, regulation of cell proliferation, and epithelial cell proliferation. The development-related biological processes consist of regulation of cartilage development, cartilage development, connective tissue development, chondrocyte development, skeletal system development, liver development, hepaticobiliary system development, positive regulation of

Identification of OA-Related GO Molecular Functions
Using these 202 OA risk genes, we identified only one statistically significant GO molecular function (FDR < 0.05) TGF-β receptor binding (P = 9.30E−06). Here, we provide the top 20 significant molecular functions, as provided in Table 3.
Most of these molecular functions are related with binding including TGF-β receptor binding, phospholipid binding, TGF-β binding, peptide antigen binding, antigen binding, phosphatase binding, type II TGF-β receptor binding, protein phosphatase binding, lipid binding, bridging protein binding, apolipoprotein binding, cytokine receptor binding, and cytokine binding.

Identification of OA-Related GO Cellular Components
Using these 202 OA risk genes, we identified five statistically significant GO cellular components (FDR < 0.05) including

OA Case-Control Gene Expression Analysis
Of the selected 202 risk genes, 122 risk genes were included in the OA case-control gene expression dataset, as provided in Supplementary Table 4. The results showed that 44 of these 122 risk genes showed suggestively differential expression in OA cases compared with controls (P < 0.05). Importantly, three genes including WWP2, COG5, and MAPT showed statistically different expression in OA cases compared with controls (P < 0.05/122 = 4.10E−04), as described in Table 5.

DISCUSSION
In this study, we first selected 187 independent genetic variants identified by GWAS (P < 1.00E−05) and found that most of these genetic variants are noncoding mutations. We then conducted an eQTLs analysis and found that 165 of these 187 genetic variants could significantly regulate the expression of nearby genes. Third, we identified OA susceptibility genes corresponding to these genetic variants, conducted a pathway analysis, and identified novel OA-related KEGG pathways, GO biological processes, GO molecular functions, and GO cellular components.
In KEGG database, TGF-β signaling pathway is the most significant signal (P = 5.98E−05) and is the only pathway after the BH multiple-test adjustment with FDR = 0.02. Our finding is consistent with previous findings. Recent findings provide substantial evidence that TGF-β signaling contributes to OA development and progression (Shen et al., 2014;Fang et al., 2016). In addition, we found the association of OA with other human diseases including IBD, RA, asthma, prostate cancer, hematopoietic cell lineage, transcriptional misregulation in cancer, allograft rejection, MAPK signaling pathway, graftversus-host disease, type 1 diabetes mellitus, and autoimmune thyroid disease. Previous study supported our finding about the association of human T-cell leukemia virus 1 infection pathway with OA. It is reported that human T lymphotropic virus type I retrovirus infection could alter the expression of inflammatory cytokines in primary OA (Yoshihara et al., 2004). Here, we highlighted the association of T H 17 cell differentiation pathway with OA. Evidence showed that complement could drive T H 17 cell differentiation and trigger autoimmune arthritis (Hashimoto et al., 2010;Li et al., 2017).
In GO database, we identified 24 statistically significant GO biological processes, one statistically significant GO molecular function, and five statistically significant GO cellular components (FDR < 0.05). These signals are related with chondrocyte differentiation and development, which are all known biological pathways associated with OA. Take the chondrocyte differentiation (GO:0002062), for example, previous studies also supported the hypertrophic differentiation of chondrocytes in OA (Dreier, 2010;Goldring, 2012). Some biological pathways are related with TGF-β signaling binding, such as TGF-β receptor binding, TGF-β binding, and type II TGF-β receptor binding. The phospholipid binding (GO:0005543) is the second significant signal among the top 20 OA risk molecular functions identified in GO database. Evidence shows that lipids are important nutrients in chondrocyte metabolism (Villalvilla et al., 2013). The lipid availability is important to keep cartilage status and OA development (Villalvilla et al., 2013).
Using an OA case-control gene expression analysis, we showed that 44 of these 122 risk genes were suggestively differentially expressed in OA cases compared with controls (P < 0.05). Three genes, WWP2, COG5, and MAPT, were statistically differentially expressed in OA cases compared with controls (P < 0.05/122 = 4.10E−04).
In summary, we believe that these findings provide further insight into the underlying genetic mechanisms for these newly identified OA risk genes. Although these are interesting findings, we also realize some limitations to our study. Until now, pathway analyses of RA GWAS datasets have widely been reported, but not OA GWAS datasets (Eleftherohorinou et al., 2009;Eleftherohorinou et al., 2011;Lee et al., 2012b;Song et al., 2013;Zhang et al., 2016). In the future, we will compare our findings using top OA genetic variants with pathway analysis of OA using the whole GWAS datasets and further clarify the differences between top OA genetic variants and the whole OA GWAS datasets. In addition, a gene expression analysis in OA-related tissues is necessary to demonstrate that these pathways are deregulated in OA cases and controls. However, this kind of gene expression datasets in OA-related tissues is not available now. Hence, we will further evaluate our findings using gene expression datasets in OA-related tissues in the future.