A Convergent Study of Genetic Variants Associated With Crohn’s Disease: Evidence From GWAS, Gene Expression, Methylation, eQTL and TWAS

Crohn’s Disease (CD) is one of the predominant forms of inflammatory bowel disease (IBD). A combination of genetic and non-genetic risk factors have been reported to contribute to the development of CD. Many high-throughput omics studies have been conducted to identify disease associated risk variants that might contribute to CD, such as genome-wide association studies (GWAS) and next generation sequencing studies. A pressing need remains to prioritize and characterize candidate genes that underlie the etiology of CD. In this study, we collected a comprehensive multi-dimensional data from GWAS, gene expression, and methylation studies and generated transcriptome-wide association study (TWAS) data to further interpret the GWAS association results. We applied our previously developed method called mega-analysis of Odds Ratio (MegaOR) to prioritize CD candidate genes (CDgenes). As a result, we identified consensus sets of CDgenes (62–235 genes) based on the evidence matrix. We demonstrated that these CDgenes were significantly more frequently interact with each other than randomly expected. Functional annotation of these genes highlighted critical immune-related processes such as immune response, MHC class II receptor activity, and immunological disorders. In particular, the constitutive photomorphogenesis 9 (COP9) signalosome related genes were found to be significantly enriched in CDgenes, implying a potential role of COP9 signalosome involved in the pathogenesis of CD. Finally, we found some of the CDgenes shared biological functions with known drug targets of CD, such as the regulation of inflammatory response and the leukocyte adhesion to vascular endothelial cell. In summary, we identified highly confident CDgenes from multi-dimensional evidence, providing insights for the understanding of CD etiology.


INTRODUCTION
Crohn's Disease (CD) is one of the major forms of inflammatory bowel disease (IBD). CD has a prevalence of 26 to 200 per 100,000 person in populations with European ancestry (Loftus, 2004). Family studies have shown that CD has 0.25 to 0.42 heritability (Gordon et al., 2015). Dysregulated immune response to environmental factors such as gut microbiome (Khor et al., 2011;Jostins et al., 2012;Ananthakrishnan, 2013) has been reported in CD. Complex diseases like CD are usually affected by a large number of genetic factors and environment factors (Rivas et al., 2011). Recent genome-wide association studies (GWAS) of CD have successfully identified more than two hundreds diseaseassociated loci at the genome-wide significance level (Franke et al., 2010;Liu et al., 2015). However, these findings could only explain a moderate proportion of the heritability (Verstockt et al., 2018). Recently, integrating GWAS signals with transcriptomewide association study (TWAS) and expression quantitative trait loci (eQTL) annotation has become an effective approach to identify new susceptibility loci and has been successfully applied in several complex diseases including CD (He et al., 2013;Marigorta et al., 2017;Gusev et al., 2018). Other forms of genetic variants are also implied, such as copy number variation (CNV) and rare variants, and they are expected to have large effects (Visscher et al., 2017). For example, a genome-wide association study of CNVs identified IRGM (immunity-related GTPase family, M) and the HLA gene family for CD (Wellcome Trust Case Control Consortium et al., 2010). Several genes were reported to harbor rare variants associated with CD, such as NOD2 (Nucleotide Binding Oligomerization Domain Containing 2, Alias CARD15) and ADCY7 (Adenylate Cyclase 7) (Hunt et al., 2013;Luo et al., 2017). Apart from those genetic variants, epigenetic alternations were also observed in CD patients. For example, altered methylation levels in peripheral blood were reported for the genes MIR21 (MicroRNA 21), TXK (TXK Tyrosine Kinase), ITGB2 (Integrin Subunit Beta 2) and HLA loci in case-control studies (Adams et al., 2014;Ventham et al., 2016). Lastly, a number of transcriptome profiling studies have been conducted, revealing genes that were differentially expressed in CD compared to controls, such as IFITM1 (Interferon Induced Transmembrane Protein 1), STAT1 (Signal Transducer And Activator Of Transcription 1), TAP1 (Transporter 1, ATP Binding Cassette Subfamily B Member), and PSMB8 (Proteasome Subunit Beta 8) identified using endoscopic pinch biopsies (Wu et al., 2007) and SERPINB2 (Serine (or cysteine) proteinase inhibitor, clade B (ovalbumin), member 2, PAI 2), NCK2 (NCK Adaptor Protein 2), and ITGB3 (Integrin Subunit Beta 3) identified using peripheral blood mononuclear cell (PBMC) (Burczynski et al., 2006). Each of these unbiased, GWAS have provided unique insights and candidate pathogenic variants and genes to understand the etiology of CD. However, challenges remain in how to effectively integrate these heterogeneous association data that range in a wide variety of biological processes.
Considerable work have been developed by integrating highthroughput multi-omics data ranging from unsupervised data integration to supervised data integration (Jiang et al., 2014;Wang et al., 2015;Huang et al., 2017;Jia et al., 2017). However, most of these tools require domain expertise, especially for the investigated diseases. Under the assumption that the number of susceptibility genes to complex disease is limited (Yang et al., 2005), we developed an unsupervised machine learning approach named mega-analysis of Odds Ratio (MegaOR) to prioritize candidate genes from multiple omics data . MegaOR relies on that each single omics data was conducted with control of false discoveries using the domain specific criteria (e.g., fold change for gene expression studies and stringent genomewide significance threshold for GWAS data). We successfully demonstrated the method in schizophrenia . In this study, we collected five types of omics data, each representing a genome-wide association study of a molecular type with CD. We investigated the disease relevant tissues using unbiased GWAS data and conducted TWAS for CD in these tissues. By applying MegaOR, we prioritized consensus sets of candidate genes and investigated their characteristics using functional enrichment analysis and drug target crosstalk.

GWAS Summary Statistics
We collected the summary statistics from a GWA study for CD conducted by the International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) . The study included 27,726 individuals (5,956 cases and 21,770 controls) of European ancestry genotyped using a combination of array platforms, including Affymetrix GeneChip Human Mapping 500K, Affymetrix Genome-Wide Human SNP Array 6.0, and Illumina HumanHap300 BeadChip. The genotype data were also imputed based on the 1000 Genomes Project reference panel (1000Genomes Project Consortium et al., 2015. In total, the GWAS summary statistics included association results for a total of 11,002,658 SNPs either genotyped or imputed (score > 0.3).

Gene Expression Data
We approached the gene expression data from a recent study that profiled the whole blood expression of 24 CD patients and 23 healthy controls (Ventham et al., 2016) (GEO accession ID: GSE86434). The expression data was generated using Illumina HumanHT-12 V4.0 expression BeadChip platform (GPL10558), which contained about 31,000 annotated genes with more than 47,000 probes. We used the online tool GEO2R 1 to conduct differential gene expression analysis. We compared the expression of whole blood mRNA between CD cases and controls. Following the method used in the original paper, log2 transformation was conducted for the expression data, and then Limma (R package) was used to adjust covariates (age and gender) to obtain the differentially expressed genes (DEGs) between CD cases and controls. Genes with fold change (FC) ≥ 1.5 or ≤ 0.67 and adjusted p-value < 0.05 (the Benjamini and Hochberg method) were defined as DEGs (Mitra et al., 2015;Ritchie et al., 2015;Hu et al., 2018).

Methylation Data
We obtained the methylation data from a recent study that conducted differential methylation analysis using 121 CD cases and 191 healthy controls (Ventham et al., 2016) (GEO accession ID: GSE87648). The study provided whole genome methylation using Illumina HumanMethylation450 BeadChip platform (GPL13534), which contained ∼485,000 probes. We requested the methylation results from the author of the study. This differential methylation genes was generated using whole blood leukocyte samples. In the original work (Ventham et al., 2016), the authors normalized the methylation matrix using the R package lumi and estimated the cell proportion by the R package minfi. Lastly, Limma was used to identify differentially methylated CpG probes. Probes were mapped to genes according to the annotation file of the chip (Jiang et al., 2016). For genes with multiple probes, we selected the most significant probe for the gene.

Gene-Based Association Test Using Pascal
As our analysis builds on genes and the GWAS summary statistics provided association results for SNPs, we compiled a p-value for each gene using the association results of SNPs mapped to the gene. Specifically, we considered all SNPs mapped to the gene body or 50 kb upstream or downstream of the gene. We used the method Pascal to calculate the gene-based p-values (Lamparter et al., 2016). Pascal utilizes the sums of chi-squares and controls potential biases from gene length, SNP density, and the local LD structure. We used the European panel as the reference, as similarly, did in a recent study (Sun et al., 2018).

Tissue-Specific Enrichment Analysis (TSEA)
To identify the tissues in which the GWAS genes were specifically expressed, we conducted a tissue specific enrichment analysis using our in-house R package, deTS (Pei et al., 2019a). deTS provides a preprocessed reference panel with 47 tissues (each with ≥ 30 samples) from the GTEx (v7) expression data (GTEx Consortium et al., 2017) and implements Fisher's Exact Test for the enrichment analysis. We applied deTS to genes defined by the Pascal results.

Transcriptome Wide Association Studies (MetaXcan)
Transcriptome-wide association study estimates genetically regulated expression (GReX) for each gene and conducts association studies between genes and traits by assessing the difference of GReX in trait samples and control samples. We utilized the method MetaXcan for a TWAS analysis of the CD GWAS summary statistics (Barbeira et al., 2018). The pre-calculated weight matrix was downloaded from http:// predictdb.org/. We utilized three disease-relevant tissues for the analyses, where were determined based on previous knowledge and deTS results.

Integrative Analysis of eQTL and GWAS Data (Sherlock)
Considering that many disease-associated genetic variants have regulatory roles, we applied the method Sherlock to integrate eQTLs and GWAS with the aim to identify concordant evidence between the two platforms (He et al., 2013). Sherlock uses a Bayesian statistical method to match the signature of genes from eQTLs to GWAS. As eQTL data have population and tissue specificity, we applied Sherlock for the CD GWAS data using the same tissues as for MetaXcan. A gene-based p-value was calculated from Sherlock for each gene in each tissue.

Mega-Analysis of Odds Ratio (MegaOR)
We adopted our previous work MegaOR to identify a consensus set of candidate genes that collectively had the most intensive load of evidence for their association with CD (hereafter referred as CDgenes). MegaOR took a multidimensional data matrix as the input. In each dimension, genes that were determined as significantly associated with the trait based on the domain-specific threshold were labeled as 1 while other genes that failed the significance threshold were labeled as 0. For example, in the category of gene expression, significantly differentially expressed genes [FDR < 0.05 and (FC) ≥ 1.5 or ≤ 0.67] were labeled 1 and other genes 0. The same preprocessing was performed for each dimensional data following the particular domain-specific thresholds. As a result, the multidimensional data matrix included only binary values. MegaOR took this binary data matrix and defined a combined OR , where OR represented the Odds Ratio for each dimension, d was the dimension of evidence, and µ was the average OR across dimensions. The part (OR−µ) 2 d was introduced as the penalty to control deviation of any dimensional OR and served to balance the multidimensional lines of evidence. MegaOR implemented an iterative optimization procedure to find the best set of genes (denoted by S) with the pre-defined size n such that at the stable status, genes in S had the best cOR. A workflow was illustrated in Figure 1. Further details can be found in our previous work .

Functional Enrichment Analysis
We used the R package RDAVIDWebService (version 1.16.0) for functional enrichment analysis. We focused on Gene Ontology (GO) and genetics association database (GAD) (Fresno and Fernandez, 2013). GO functional annotation tool (FAT) was used to filter out very broad terms based on a measured specificity of each term (not level-specificity). We further use the plug-in ClueGO of Cytoscape to display the relationship between genes and GO terms (Shannon et al., 2003;Bindea et al., 2009). Only GO terms with more than five CDgenes were demonstrated.

Drug Target Gene Enrichment Analysis
We queried the Therapeutic Target Database 2 to identify Food and Drug Administration (FDA) approved drugs that were used FIGURE 1 | Workflow of the study. DEG, differentially expressed genes; DMG, differentially methylated genes; p BH , the Benjamini and Hochberg method; FC, fold change; OR, Odds Ratio; CDgenes, Crohn's Disease genes.
for CD (Li et al., 2018). Meditation target genes for CD were extracted from the database.

Protein-Protein Interaction (PPI) Analysis
We searched the STRING database 3 to identify protein-protein interactions (PPIs) between CD drug target genes and our CDgenes (Szklarczyk et al., 2017). We selected Homo sapiens as the organism and considered only the PPIs that were experimentally validated with medium confidence > 0.35.

Multi-Dimensional Evidence for Crohn's Disease
Using the approaches described in methods, we organized our data into five major categories: Pascal (combined GWAS information), Sherlock (integrative information of GWAS and eQTL), MetaXcan (TWAS), gene expression (with DEGs labeled as 1), and methylation (with differentially methylated genes (DMGs) labeled as 1). Particularly for Sherlock and MetaXcan, the analyses were performed for different tissues and thus, each had multiple sets of omics data. Each dimension presents a unique biological aspect to assess the potential association between a gene and CD.
As previously reported, interpretation of disease-associated genetic variants are more appropriate in tissues that are related to the diseases, as genetic regulation has a strong tissue specificity.
Among the 1,668 genes, 1,287 (79.3%) genes had only one line of evidence and no gene was found with more than eight lines of evidence. We further merged the Sherlock genes from the three tissues and obtained a union of 398 Sherlock genes (5.6%, Figure 2C) for the following analysis of MegaOR. Similarly, a union of 305 MetaXcan genes (4.1%, Figure 2D) were obtained from three result sets in three tissues for MetaXcan. Collectively, these multidimensional data were organized as the input matrix with 1,668 genes in five dimensions, each representing one kind of disease association evidence. We referred this matrix as the evidence set (ES) genes.
As a control, we generated a second set of genes containing all the protein-coding genes that were expressed in the three CD related tissues, without requiring them to have at least one line of evidence in association with CD. Specifically, we obtained 13,763 protein-coding genes (GENCODE v19) that had an average RPKM (Reads Per Kilobase of transcript, per Million mapped reads) value >1 in whole blood, spleen, or small intestine (terminal ileum) (GTEx v7 data). These genes, referred as tissue set (TS) genes, were considered with very weak support for their potential association with CD. A total of 1,286 genes were shared between the TS genes and the 1,668 genes with evidence. After removing redundancy, we built a second matrix with a union of 14,065 genes (13,763 TS genes expressed in CD-relevant tissues and 1,668 genes with at least one line of evidence in association with CD). We applied MegaOR to both matrices and we expected that MegaOR could prioritize disease genes with or without the TS genes that had weak association evidence.

TSEA to Determine CD Related Tissues
Crohn's Disease causes inflammation of the gastrointestinal tract (Fakhoury et al., 2014). Digestive tissues such as colon and small intestine (terminal ileum) have long been considered to be related to CD (Wu et al., 2007). Among the multidimensional data and methods, Sherlock and MetaXcan both require pre-defined disease relevant tissues. DEGs and DMGs were obtained using blood samples. Hence, only Pascal genes from GWAS data were suitable for the determination of tissues (Pei et al., 2019b). We performed TSEA using Pascal genes defined at different threshold (p < 0.05, p < 0.01, p < 5 × 10 −3 , p < 1 × 10 −3 , p < 5 × 10 −4 , p < 1 × 10 −4 , p < 5 × 10 −5 , p < 1 × 10 −5 , and p < 5 × 10 −6 , Figure 2). As shown in Figure 2, Pascal genes were found to be most significantly enriched in whole blood at different thresholds (e.g., the most significant p-value being 9.75 × 10 −7 when using genes with p Pascal < 0.05), followed by small intestine (terminal ileum) (the most significant p-value being 3.22 × 10 −3 when using genes with p Pascal < 0.005). Both spleen and lung were found to be enriched with Pascal genes. However, considering that spleen acted as a filter for blood as part of the immune system while lung had no obvious link to CD, we selected whole blood, small intestine (terminal ileum), and spleen as the three most relevant tissues to CD and used these tissues for the application of Sherlock and MetaXcan.

Pair-Wise Comparison of the Multidimensional Association Data
To explore the correlation among different dimensional data, we conducted a pair-wise comparison using genes from each group. We used Fisher's exact test to test if any two types of evidence were associated. As shown in Figure 2E, among all possible pairs (n = 15), we only observed a significant correlation between Sherlock and MetaXcan genes (p = 2.63 × 10 −43 ). This is within expectation because both data types measure the integrative signals of genetic variants and their regulatory roles in diseases. Surprisingly, Pascal genes had no correlation with either Sherlock genes (p = 0.95) or MetaXcan (p = 0.98), even though both Sherlock and MetaXcan used the same GWAS data as the input to calculate gene-based p-values. This lack of association implied that there was independent information that could be obtained by integrating eQTL and GReX in interpreting GWAS data, providing a fundamental support to our work of integrating these diverse evidence data. In addition, DEGs and DMGs showed no association with any of the other dimensional data.

CDgenes Identified by MegaOR
To identify a set of candidate genes that have the most intensive load of evidence, we applied MegaOR to the multidimensional evidence data, respectively, the ES matrix with 1,668 genes (each with at least one type of evidence) and the TS matrix with 14,065 genes (the union of the genes expressed in disease-relevant tissues and genes from the ES matrix). We tested eight set sizes separately, i.e., S = 150, 190, 230, 270, 310, 350, 390, 430 for the ES matrix and T = 230, 270, 310, 350, 390, 430, 470, 510 for the TS matrix. For each set size, there were likely different sets of genes reaching the best cOR, even though they have the same number of genes. Thus, we applied MegaOR for each set size 100 times. The average ORs at each set sizes were displayed in Figures 3A,B. Taking the ES matrix as an example, we obtained eight sets of CDgenes. At each size, we selected genes that were retained in more than 50% times ( Figure 3E). We referred the genes at each set size to S1 (set size:  (230), and T8 for n = 510 (235). In both sets, a converged stable status could be observed from S6 to S8 and T7 to T8, respectively (Figures 3C,D). Thus, we suggested that the 235 CDgenes in S7 and the 235 genes in T8 were close to consensus sets of CDgenes that could reach the global maximum load of evidence. Interestingly, the two sets of CDgenes (S7 and T8) shared 234 genes. Thus, we found MegaOR performed relatively stable to generate such consensus sets of candidate genes.

CDgenes Interact With Each Other Significantly
Many disease genes were reported to interact with each other more often than with randomly selected genes, especially genes associated with the same diseases (Barabasi et al., 2011). This was likely because genes underlying the same disease are often involved in related biological pathways. To investigate whether (E) The frequency of genes covered by 100 stable sets at an example size S = 390 in at least on type of evidence set (ES). Genes on the left part of the plot in green were less frequently recovered (<50% occurrence). Genes on the right part of the plot were selected as the CDgenes for the corresponding set size.
our CDgenes tended to interact more often with each other, we curated protein-protein interaction (PPI) data from three sources. The first network was from HumanNet and has been previously used to study GWAS data (Lee et al., 2011). The second network was from a precomputed influence graph that was recently used in cancer (Ding et al., 2015). The third network was a combined dataset of HPRD and STRING (MAGI) (Hormozdiari et al., 2015). For each set of CDgenes, we recorded the number of interactions among CDgene and resampled 10,000 random gene sets, each with the same number of CDgenes. The number of random gene sets that had interactions exceeding the actual number of interactions was used to calculate an empirical p-value. We performed this analysis in each human PPI network, respectively. Interestingly, CDgenes showed significantly more PPIs than those from random gene sets in both HumanNet joint, influence_graph, and MAGI (Figure 4), implying that our CDgenes tended to interact with each other significantly more frequently than expected in random gene sets.
FIGURE 4 | Distribution of protein-protein interactions (PPIs) among CDgenes. The analysis was conducted using the HumanNet joint reference panel (A), the influence graph reference panel (B), and MAGI (C). In each panel, the distribution of the intersections was displayed for 10,000 randomly selected gene sets, each set with the same set sizes as the query set. X-axis is the number of interactions. Y -axis is the frequency of the interactions. In the title of each panel, V denoted the number of CDgenes that were annotated in the corresponding PPI network, E denoted the number of interactions among these CDgenes, and the p-value was the empirical rank p-value. The vertical red line indicates the number of interactions observed for the actual CDgenes. . Y -axis indicated the enriched gene sets with adjusted p-value < 0.05. X-axis is the -log 10 (adjusted p-value). Circle sizes were proportional to the shared CDgenes with genes from the corresponding gene set. Circle color was proportional to the adjusted p-value. (B) Enrichment analysis results using the ClueGO method in Cytoscape (Bindea et al., 2009). Each dot represented a gene or a GO term. Dots in the same color were considered from the same functional group by ClueGO annotation. Gene names were highlighted in red. Each edge indicated the gene was a component gene of the linked GO term.

Functional Enrichment Analysis of CDgenes
To identify the biological roles of the genes in the significant modules, we performed functional enrichment analysis using DAVID (See section "Materials and Methods"). We focused on GO terms and gene sets from the GAD. Our finding showed that the 235 CDgenes in S7 were enriched with MHC class II receptor activity (GO: 0032395, Molecular Function, p = 9.08 × 10 −6 ), immune response (GO: 0006955, Biological Process, p = 1.02 × 10 −14 ), and MHC protein complex (GO: 0042611, Cellular Component, p = 2.85 × 10 −5 ) ( Figure 5A). In GAD, immunological disorders such as Systemic Lupus Erythematosus (adjusted p = 2.52 × 10 −24 ) and Psoriasis (adjusted p = 7.56 × 10 −20 ) were found to be most significantly enriched ( Figure 5A). Importantly, the category "Crohn's Disease" from GAD was also significantly enriched in our CDgenes (adjusted p = 2.44 × 10 −13 ). Evidence of 235 CDgenes was provided in Supplementary Tables S1, S2.

DISCUSSION
In this work, we collected five multi-dimensional data to prioritize CD-associated genes. Using tissue specific enrichment analysis and GWAS data, we determined three tissues that were most related to CD [whole blood, spleen, and small intestine (terminal ileum)]. With these tissues, we calculated integrative association signals between tissue eQTL and GWAS data and conducted tissue-specific TWAS. We constructed two evidence matrices and applied MegaOR to identify a consensus set of CD-associated genes. The candidate CDgenes in this consensus set tended to interact with each other more often than size-matched random genes, indicating these CDgenes could functionally cooperate with each other. Functional enrichment analysis showed that these CDgenes were enriched in immune related diseases and biological processes. Moreover, methods of integrative studies such as MegaOR are powerful tools to unravel the etiology of complex diseases O'Brien et al., 2018). With the increasing volume of omics data, these methods could be easily extended to other complex diseases, such as cancer, psychiatric diseases, and immune diseases.

Function Enrichment Analysis of CDgenes Highlighted COP9 Signalosome
Our consensus CDgenes provided a promising list of candidate genes for CD. The significantly enriched pathways and functional sets suggested that CDgenes were biologically related to CD. In addition, we observed quite a number of promising genes with various types of evidence, such as genes involved in antigen binding (HLA-DOA, HLA-DOB, HLA-DQA2, HLA-DQB1, TAP1, and TAP2) and genes involved in the immune response (NOD2, IFITM1, PSMB8, TXK, and AIM2). Other genes of interest included NCKIPSD (NCK interacting protein with SH3 domain: adjusted p Pascal = 1.00 × 10 −3 , adjusted p Sherlock = 0.037 in whole blood, adjusted p MetaXcan = 0.13 in whole blood), WDR6 (WD repeat domain 6, adjusted p Pascal = 0.029, adjusted p Sherlock = 5.60 × 10 −3 in small intestine (terminal ileum), adjusted p MetaXcan = 0.025 in whole blood), DOCK7 (dedicator of cytokinesis 7, adjusted p Pascal = 2.00 × 10 −3 , adjusted p Sherlock = 2.40 × 10 −3 in whole blood, adjusted p MetaXcan = 2.10 × 10 −3 in whole blood), SPNS1 [Sphingolipid Transporter 1 (Putative), p Pascal = 4.02 × 10 −3 , adjusted p Sherlock = 2.19 × 10 −3 in whole blood, p MetaXcan = 7.43 × 10 −3 ), FLOT1 (flotillin 1, p Pascal = 3.79 × 10 −3, p Sherlock = 0.13 in whole blood, p DEG = 5.87 × 10 −3 ),and HSPA7 (encoding heat shock protein family A (Hsp70) member 7, p Sherlock = 0.14 in whole blood, p DEG = 1.34 × 10 −3 )]. With NOD2, these seven genes were all from the COP9 signalosome (CSN) (53 genes in this term from ClueGO annotation, Figure 5B and Supplementary Tables S3, S4). Interestingly, these seven genes were not the subunits of CSN complex, but they interacted with CSN complex as suggested by affinity purification and mass spectrometry experiment (Fang et al., 2008). CSN is a multi-subunit protease that regulates the activity of cullin-RING ligase (CRL) families of ubiquitin E3 complexes with isopeptidase activity. The major activities that CSN was involved included de-ubiquitination activity and phosphorylation of important signaling regulators in protein kinase activities (Wei and Deng, 2003;Wei et al., 2008). Previous studies have revealed COP9 signalosome subunit 5 (CSN5/Jab1) could regulate the development of immune system in Drosophila (Harari-Steinberg et al., 2007). In mice, deficiency of one subunit of COP9 resulted in dysfunction of paneth cell and colonic enterocyte, which could lead to impaired antimicrobial peptide and might change the composition of intestinal microbiota (Wang et al., 2014). This evidence infers the dysregulation of CSN might impact the intestinal microbiota and lead to pathogenesis of inflammatory bowel disease. In addition, disrupting CSN subunit showed impact in T-cell development and antigen response, indicating CSN might involve in the homeostasis of T cells (Menon et al., 2007;Panattoni et al., 2008). Although the debates continue on that whether microbiota, innate immunity or T cell activation leads to CD, our study shed lights on the potential etiology of CD through the dysregulation of COP9 signalosome. These seven genes were only able to be discovered when integrating multi-dimensional evidence, demonstrating the advance of MegaOR to unveil such signals, which cannot be achieved by traditional single domain approaches.

CDgenes as the Potential Drug Target
Disease associated genes are natural candidates for drug development in both complex disease and cancer (Butcher et al., 2004;Zhao et al., 2015;Lee et al., 2016). We further compared our CDgenes with known target genes of CD meditation using the Therapeutic Targets Database (TTD) (Li et al., 2018).

Limitation
There were some limitations of the current work. First, although we collected five dimensional data, there were still other omics data that were missed in our work. For example, previous studies have reported that copy number variations could be associated with CD (Wellcome Trust Case Control Consortium et al., 2010). However, the number of genes implied by CNV studies were very limited (∼10) and we could not include them into our matrix. Second, due to the limited tissue data, our DEGs and DMGs were both generated using PBMCs from CD patients and samples, instead of disease tissues from the patients. PBMCs are signs of infection and auto-immune diseases (Burczynski et al., 2006). Future studies are warranted to use samples from disease related tissues, such as intestinal biopsies (Wu et al., 2007). Lastly, due to the data heterogeneity, we used different threshold to control FDR for each individual omics data, e.g., adjusted p < 0.05 in selecting DEGs while adjusted p < 0.2 for MetaXcan, Sherlock and DMGs. This inconsistence among different omics data may lead to inaccurate estimate of the actual OR. In future studies, when more data are generated, either from different omics or multiple data sets for the same omics, an enhanced evidence matrix could be constructed to validate the current CDgenes.

CONCLUSION
In summary, we conducted an integrative analysis of genetic, epigenetic, and transcriptomic data in CD. Our approach prioritized candidate genes associated with CD from multidimensional data and such methods could be extended to many other complex diseases with multi-dimensional omics data being available. Functional analysis of these CDgenes revealed strong immune response enrichment. We further highlighted the potential involvement of COP9 signalosome in CD and suggested interactions among our CDgenes with CD drug target genes.

DATA AVAILABILITY
Publicly available datasets were analyzed in this study. The data used in R package "deTS" can be found here: https://gtexportal. org/home/. Other data could be obtained from the resource described in Materials and Methods.

AUTHOR CONTRIBUTIONS
PJ and ZZ conceived and designed the study. YD performed the data preparation and analysis, YD and GP performed the result demonstration. YD, PJ, and ZZ wrote the manuscript. All authors have read, edited, and approved the current version of the manuscript.

FUNDING
This work was supported by the UTHealth Presidential Collaborative Research Award. This work was partially supported by National Institutes of Health grant (R01LM012806). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

ACKNOWLEDGMENTS
The authors would like to extend the gratitude to Dr. Ventham from The University of Edinburgh to share the methylation results with us and to answer our questions.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene. 2019.00318/full#supplementary-material FIGURE S1 | STRING-network interaction of genes.