Identification of the shared genetic architecture underlying seven autoimmune diseases with GWAS summary statistics

Background The common clinical symptoms and immunopathological mechanisms have been observed among multiple autoimmune diseases (ADs), but the shared genetic etiology remains unclear. Methods GWAS summary statistics of seven ADs were downloaded from Open Targets Genetics and Dryad. Linkage disequilibrium score regression (LDSC) was applied to estimate overall genetic correlations, bivariate causal mixture model (MiXeR) was used to qualify the polygenic overlap, and stratified-LDSC partitioned heritability to reveal tissue and cell type specific enrichments. Ultimately, we conducted a novel adaptive association test called MTaSPUsSet for identifying pleiotropic genes. Results The high heritability of seven ADs ranged from 0.1228 to 0.5972, and strong genetic correlations among certain phenotypes varied between 0.185 and 0.721. There was substantial polygenic overlap, with the number of shared SNPs approximately 0.03K to 0.21K. The specificity of SNP heritability was enriched in the immune/hematopoietic related tissue and cells. Furthermore, we identified 32 pleiotropic genes associated with seven ADs, 23 genes were considered as novel genes. These genes were involved in several cell regulation pathways and immunologic signatures. Conclusion We comprehensively explored the shared genetic architecture across seven ADs. The findings progress the exploration of common molecular mechanisms and biological processes involved, and facilitate understanding of disease etiology.


Introduction
Autoimmune diseases(ADs) represent a heterogeneous group of disorders characterized by an immune response against selfantigens, leading to target tissues destruction (1).Genetic factors are widely acknowledged to play an essential role in the pathogenesis of multiple ADs (2).Previous studies document that monozygotic twins have a higher concordance rates of ADs compared to the dizygotic twins, and that siblings are at higher risk than general population (3,4).In addition, these diseases are also comorbid that individuals who are susceptible to one disease have a heightened risk of another disease, with dual diagnoses being more frequent, suggesting that ADs may share the common genetic basis (5,6).
Genome wide association studies (GWAS) have discovered thousands of susceptibility loci related to ADs, and confirmed their polygenic property (4).Particularly, many genetic loci display a common link to multiple ADs.For instance, PTPN22, known as one the strongest risk gene to promote the development of ADs, is particularly related to systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and type 1 diabetes (T1D) (7).IL2RA is predisposed to multiple ADs including RA, T1D and multiple sclerosis (MS) (8).Additionally, CTLA4 is responsible for various T cells immune regulation, and its blockade results to major fatal autoimmunity (9).Inversely, targeting CTLA-4 pathway binding the costimulatory molecules can be extensively used to treat ADs, such as RA.Therefore, exploration of genetic pleiotropy is essential to reveal common disease mechanisms and develop potential treatments.
To date, the shared genetic and functional profile of ADs have been mostly identified by simple comparison based on univariate or bivariate analyses, which have insufficient power without combining the related phenotypes (10).Recently, combined studies that incorporate several diseases have proven to be a powerful way to determine the genetic overlap existing in several diseases.A cross-disorder analysis reveals new shared loci converging in T cell activation and signaling pathways by combining 9 immune diseases (11).A multi-trait meta-analysis discovered 4 novel loci shared between several autoimmune and allergic diseases (12).Furthermore, Kwak et al. (13) proposed an adaptive analysis on the basis of summary of powered score (MTSPUsSet), that allows identification of gene-level associations by assembling information from multiple phenotypes and SNPs.MTSPUsSet has been employed to recognize potentially genes involved in five psychiatric disorders (14).Accordingly, researchers are dedicated to explore common genetic mechanisms and pathogenesis through multivariate analytical methods.
In this study, comprehensively and systematically multivariate association analyses were performed to elucidate the shared genetic architecture of seven ADs including T1D, SLE, RA, MS, celiac disease (CEL), ulcerative colitis (UC) and primary biliary cirrhosis (PBC), based on GWAS summary statistics from three levels: heritability and genetic correlation, tissue and cell specificity, and pleiotropy.The analysis flowchart was illustrated in Figure 1.

Study samples
GWAS summary statistics were downloaded from Open Targets Genetics for CEL, MS, PBC, RA, UC and SLE (website: https://genetics.opentargets.org/)(15).In particular, the CEL summary statistics was derived from 4533 patients and 10750 controls with 523402 variants (16).The summary statistics for MS comprised 464357 SNPs variants involving 9772 cases from 23 research groups (17).The PBC dataset with 1134141 SNPs variates were derived from a meta-analysis together with another cohort containing 6480 patients and 14736 controls (18).The RA summary statistics included 8254863 variants obtaining from metaanalysis of 12307 patients and 28975 controls (8).The UC summary statistics consisted of 1407735 variants from a meta-analysis of 16315 patients and 32635 controls (19).The SLE dataset comprised 7915251 variants from a meta-analysis with 7219 patients and 15991 controls (20).The summary statistics for T1D involving 8781607 variants included 5913 patients and 8828 controls from a meta-analysis in Dryad (website: https://datadryad.org/stash/)(21).All samples of seven phenotypes were from population of European ancestry.GWAS summary statistics were subjected to rigorous quality control as described in the original publication.The i n f o r m a t i o n o f s u m m a r y s t a t i s t i c s w a s s h o w e d i n Supplementary Table 1.

Linkage disequilibrium score regression analysis
Linkage disequilibrium score regression (LDSC) is a statistical methodology that estimates the SNP heritability (h 2 g ) and genetic correlations (r g ) between traits with GWAS summary data.This method could minimize the impact arising from confounding factors and population stratification (22).This study used the precalculated LD scores based on 1000 Genomes Project's European population to ensure imputation quality.The analysis was conducted using LDSC software, version v1.0.1 (https:// github.com/bulik/ldsc).

Quantification of polygenic overlap using MiXeR
The causal mixture model provided by MiXeR was employed to quantify both the unique and common polygenic components contributing to complex phenotypes, using the GWAS summary data (https://github.com/precimed/mixer).In cross-traits analysis, the bivariate MiXeR model additive effects with 4 independent genetic components based on the principle that only a small subset of variants significantly influences the trait (23), (b 1j , b 2j ) ∼ p 0 N(0, 0) + p 1 N(0, o 1) + p 2 N(0, o 2) + p 12 N(0, o 12), where p 0 is fraction of null SNPs in both traits; p 1 and p 2 are fraction of SNPs having unique impact on the one trait; p 12 is fraction of SNPs having common impact on two traits; in variancecovariance matrix, r 12 represents correlation of the overlap component, s 2 1 and s 2 2 indicate the variance of effective SNPs for the two traits.We used 1000 Genomes Europeans as a reference panel to assess the count of specific and shared effective genetic variants.

Tissue and cell specificity
To identify the tissue and cell specificity regarding seven ADs on the basis of heritability enrichment, we performed stratified-LDSC (24), which partitions heritability into different categories and calculates category-specific enrichments.We performed the tissue specificity analyses based on Genotype-Tissue Expression (GTEx) data that provide insights into gene expression variation across 53 non-diseased human primary tissues (25).In addition, cell specificity analysis was performed utilizing 220 cell annotations from four histone marks: H3K4me1, H3K4me3, H3K9ac and H3K27ac (24).To establish a general overview of the cell types associated with the phenotype, we further divided the 220 cell annotations into 10 groups, representing the system or organ-level structure.The significance thresholds were set at P<0.05/53 = 9.4×10 -4 for tissue specificity, at P<0.05/220 = 2.3×10 -4 for celltype specificity and at P<0.05/10 = 5×10 -3 for cell-typegroup specificity.

Gene-based adaptive association tests
The data underwent multiple processing steps to obtain common SNPs suitable for pleiotropic analyses.SNP pruning based on LD was firstly applied with threshold (50 5 0.1) to eliminate highly correlated Flow chart of study design.
SNPs, and a set of 40957 SNPs were retained.The HapMap 3 CEU genotypes served as reference panel.Subsequently, gene annotation was conducted for the maintained SNPs using the hg38 genome dataset (http://www.genome.ucsc.edu/cgi-bin/hgTables).Eventually, a total of 21897 common SNPs was located in 9886 gene for identifying pleiotropic variants.
The statistic tests for the association of multiple SNPs with single trait, and the association of multiple SNPs with multiple traits on the basis of summary statistics are as follows, respectively: indicates the hth trait in the matrix Z. g 1 and g 2 are adaptive weights for SNP and trait, respectively, and both are greater than or equal to 1.
The adaptive association tests require a matrix of Z scores (Z), correlation matrix of SNPs (R), correlation matrix of multiple traits (V) and the weighting index g.The optimal choice of g, weighting the SNPs or traits, depends on the unknown association patterns, by default, G={1, 2, 4, 8}.The potential pleiotropic genes related to seven ADs were initially detected applying MTaSPUsSet.Subsequently, the aSPUs test was employed to discover genes specifically linked to individual phenotype.P<0.05/9886 = 5.06×10 -6 was considered significant after Bonferroni correction.

GWAS catalog analysis
To investigate whether previous studies reported the identified pleiotropic genes to be associated with seven ADs, we performed a systematic search in GWAS Catalog (https://www.ebi.ac.uk/gwas/) (26).

Functional annotation
To explore putative biological implications of pleiotropic genes, we performed the gene set analyses including Gene Ontology (GO) gene sets and immunologic signatures obtained from MsigDB using software functional mapping and annotation (FUMA) (27,28).Moreover, we performed the protein-protein interaction (PPI) analysis utilizing available STRING dataset (https://string-db.org/), to offer valuable insights into the functional associations and interactions between proteins encoded by these pleiotropic genes (29).

Statistical analysis
The genetic correlation as well as tissue and cell specificity were performed on the basis of all SNPs for each phenotype, and pleiotropic analyses were implemented based on the common SNPs of seven ADs.All statistical analyses were conducted using R 4.0.3 and PLINK 1.9.

Genetic correlations among seven ADs
The SNP heritability (h 2 g ) of seven ADs was first evaluated by univariate LDSC, the estimates of h 2 g varied between 0.1228 and 0.5972 (Figure 2A).Bivariate LDSC was then applied to evaluate genetic correlations across ADs with ranges of 0.185 and 0.721 (Figure 2B).Notably, we discovered multiple significant correlations (P<0.05/21= 0.002), with the strongest genetic correlation for MS-PBC (r g =0.701, se=0.213),followed by PBC-SLE (r g =0.584, se=0.082) and RA-SLE (r g =0.478, se=0.061).Furthermore, the LD score intercepts for seven ADs near 1, indicating the inflation is primarily driven by the polygenic effect.

Genetic overlap across seven ADs
As shown in conditional Q-Q plots, each line displayed leftward separation, indicating polygenic overlap between seven ADs (Supplementary Figure 1).Performing MiXeR analysis, we further quantified the polygenic overlap between ADs and represented the polygenic components as Venn diagrams (Figure 3).There was substantial polygenic overlap for CEL and PBC, sharing 0.21K out of 0.30K causal variants.PBC and UC shared 0.19K out of 0.35K causal variants.Further, CEL and UC also exhibited genetic overlap, sharing 0.17K out of 0.37K causal variants.The amount of polygenic overlap was relatively small in other pairs of diseases.

Tissue and cell specificity for seven ADs
We applied S-LDSC to explore tissue specific enrichment of heritability for seven ADs, using GTEx data for 53 tissues (Figure 4; Supplementary Table 2).We identified substantial heritability enrichment for seven ADs in immune-related tissues of cells Epstein-Barr Virus (EBV) transformed lymphocytes, lung, spleen, and whole blood.Notably, T1D and UC were also significantly enriched in adipose visceral (omentum) and colon transverse, respectively.Additionally, MS, PBC, RA and UC showed higher enrichment in small intestine terminal ileum, although failing to reach Bonferroni significance.
The S-LDSC was extended to explore the cell type specificity for seven ADs.To capture an overview of cell type associated with phenotypes, we firstly evaluated the enrichment in 10 cell groups (Figure 5; Supplementary Table 3).All seven ADs revealed significant enrichment in immune/hematopoietic group only.We then assessed heritability enrichment at cell type level to differentiate the cell type within the group (Supplementary Figure 2, Supplementary Table 4).There were 32 significant cell type enrichments for CEL, 53 for MS, 12 for PBC, 26 for RA, 8 for SLE, 45 for T1D and 7 for UC.The seven ADs were significantly enriched in several immune cells, such as CD4, CD8, CD25, T helper (Th) Th1, Th2, Th17, regulatory CD4+ T (Treg) cells, peripheral blood mononuclear primary and so on.

Potential pleiotropic genes for seven ADs
Based on significant correlation among seven ADs, the adaptive association tests were implemented to aggregate genetic effects for detecting pleiotropic variants.The multivariate analysis of MTaSPUsSet identified a total of 44 potential pleiotropic genes associated with seven phenotypes.Subsequently, the univariate aSPUs test was applied to discover genes specifically related to    individual phenotype.There were 2 significant genes for CEL, 3 genes for PBC, 8 genes for RA, 14 genes for UC, 7 genes for SLE and 5 genes for T1D.Eventually, a total of 32 pleiotropic genes were not only significant in MTaSPUsSet test, but also linked to at least one autoimmune disease identified by aSPUs test.Additionally, based on GWAS Catalog searching for pleiotropic genes, 9 genes had been reported to be related to ADs in previous studies, of which 2 genes were associated with CEL, 4 with MS, 5 with UC, 2 with PBC, 5 with SLE, 5 with RA, and 4 with T1D.Therefore, we detected 23 novel pleiotropic genes.The details of identified genes were showed in Table 1.

Functional annotations of pleiotropic genes
To clarify the biological process involved in pleiotropic genes, we conducted the functional annotations using FUMA.For gene-set analyses, we discovered two significant GO biological processes about the regulation of cell adhesion.In addition, there were three significant immunologic signatures regarding genes up-regulated in natural killer (NK) cell, dendritic cells (DC) and CD4 T cells.The information of significant annotations was showed in Table 2.
Furthermore, PPI analysis was applied to visualize the interaction of pleiotropic genes.We observed a prominent gene cluster containing WWOX (Supplementary Figure 3).

Discussion
This study systematically investigated the genetic architecture of seven ADs in terms of genetic correlation, tissue and cell specificity, and pleiotropy using multiple large GWAS summary statistics.We revealed the high heritability and strong genetic correlations among seven ADs.The SNP heritability was enriched in the immunerelated tissue such as cells EBV transformed lymphocytes, lung, spleen, and whole blood, as well as the immune/hematopoietic related cells.Furthermore, we discovered 32 pleiotropic genes related to seven ADs, 23 of which were considered as novel genes.The pleiotropic genes participated in regulation of cell adhesion and immunologic signatures.Our findings offered potential rationale for genetic mechanisms and provided important evidence for the common genetic architecture of seven ADs.
Genetic analyses suggested genetic factors making a strong contribution to the development of ADs.We identified high heritability among seven ADs, ranging from 0.1228 (UC) to 0.5972 (T1D).These results were consistent with findings detected by Li YR et al (30), which reported the heritability estimates of ADs for pediatric age-of-onset between 0.42 and 0.91, and slightly lower estimates for adult.Moreover, genetic studies documented a familial aggregation and increased concordance in monozygotic, suggesting a remarkable role for genetic factors in pathogenesis of ADs (31).
In addition, we identified several significant genetic correlations among certain ADs.For instance, there was a strong correlation between RA and MS (rg=0.439),consistent with evidence that prevalence of RA in MS patients was assessed to range from 0.35% to 2.4% (32).Moreover, Weng et al (33) studied several comorbid ADs, and found significantly increased occurrence of RA, MS and PBC in patients with inflammatory bowel diseases.And a meta-analysis of 10 pediatric ADs stated that 81% of risk loci were associated with at least two ADs, which indicated a relatively high degree of common genetic predisposition (34).Furthermore, we found and quantified substantial polygenic overlap among multiple ADs using MiXeR.The large proportion of overlap variants reconfirmed strong correlations and shared genetic background.The tissue and cell specificity of heritability enrichment suggested a distinct common etiology for multiple ADs.Across all tissue types, heritability for seven ADs were abundantly enriched in immune system-related tissues including cells EBV transformed lymphocytes, lung, spleen, and whole blood.Similar tissue enrichments for ADs had also been reported in considerable previous studies (35,36).These findings together reflected a shared mechanism that inappropriate or dysregulated immune responses were responsible for triggering multiple ADs.Nevertheless, depending on the diseasespecific context, tissue enrichment may show differentiation (37), as we found that UC was significantly enriched in the transverse colon, whereas T1D was enriched in adipose visceral.In addition, most GWAS associations were attributable to regulatory variations, which were generally cell specific and controlled gene expression in relation to cell state (38).Our cell type-specific analysis revealed the remarkable enrichment in immune/hematopoietic group and various immune-related T cells for ADs.Bourges C et al (39) had demonstrated that regulation loci of primary CD4 T cells was most enriched for immune-mediated disease SNPs.Substantial evidence supported that CD4 T cell subsets, such as Th17 and Th1 cells, were involved in many ADs such as MS, RA and SLE (40).Moreover, Treg cells were a subset of T-cell specialized for immune suppression and maintained autoimmune tolerance by expressing inhibitory receptors and secreting anti-inflammatory cytokines (41).Together, these findings demonstrated the significant role of immune function in pathogenesis of multiple ADs.
The pleiotropic genes further support the common genetic mechanisms across ADs.Among 32 pleiotropic genes detected, 9 pleiotropic genes had been revealed to influence ADs development in previous studies.We identified that PTPN22 was a shared susceptibility gene affecting multiple ADs with an increased risk of T1D, SLE, PBC, RA, and UC.Previous reports had demonstrated that PTPN22 could both inhibit T cell activation through confining downstream signals of the T cell receptor (TCR) and promote type I interferon generation of myeloid cell selectively by enhancing downstream signals of recognition receptors (42).TNFSF4 encoded a membrane-bound protein and served as a T cell co-stimulator (TNFRSF4) ligand (43).Increasing evidence supported that TNFSF4-TNFRSF4 pair could promote the survival and generation of effector T cells as well as memory CD4 + T cells, thereby participating in the occurrence and development of multiple ADs (44).Additionally, the functional annotation analysis found that TNFSF4 participated in controlling of immunologic signatures in dendritic cells and CD4 T cells.Our PPI result also revealed an interaction between PTPN22 and TNFSF4 highlighting the significant function of pleiotropic genes in immune response.Except these several confirmed pleiotropic genes, the present study inspected 23 novel genes utilizing adaptive tests, which may add new evidence for the common genetic mechanism of multiple ADs.For example, PAG1 encoded a transmembrane adaptor protein that negatively regulated immune receptor signaling in T cell, B cells and mast cells, as well as inhibited the formation of immune synapse (45).Evidence from GWAS suggested that PAG1 was related to risk of allergic and inflammatory diseases, such as asthma, although there was no direct proof of an association with AD (46).Additionally, our function results indicated that PAG1 participated in cell adhesion signaling and regulation of dendritic and CD4 T cells.The novel pleiotropic gene WWOX is a cancer suppressor gene that could induce apoptosis and inhibit growth in various cancers.The PPI analysis also showed WWOX interacted with multiple genes.Previous researches showed that WWOX had an important effect on T cells proliferation and FasL expression, and ultimately regulated the T cells apoptosis (47).Generally, loss of WWOX expression could suppress immune attack through producing apoptotic signaling.
In terms of uncovering the common genetic architecture in multiple ADs, we systematically quantified the genetic correlations and assessed functional mechanisms using the large available GWAS summary statistics.Importantly, our gene-level analysis based on MTaSPUsSet has increased power by consolidating the multi-level association information and reducing the burden caused In conclusion, our comprehensive analyses revealed relatively substantial heritability and significantly robust genetic correlations across seven ADs.The SNP heritability of ADs were significantly enriched in immune-related tissue and cells.In addition, we identified 32 pleiotropic genes shared in multiple ADs, 23 of which were novel pleiotropic genes.Our findings contribute to the understanding of genetic mechanisms, and reveal the common pathogenesis among complex diseases.

FIGURE 3
FIGURE 3Venn diagrams of unique and shared polygenic variants across seven autoimmune diseases.The gray presents polygenic overlap between two phenotypes, the blue represents unique variants of first phenotype, and the orange represents unique variants of second phenotype.The numbers indicate the estimated quantity of effective variants (in thousands) per phenotype, explain 90% of SNP heritability in each phenotype, followed by the standard error.The size of the circles reflects the degree of effective variants.The panels beneath Venn diagram represents the genetic correlation calculated by MiXeR, red/blue bars indicate positive and negative correlations, respectively.

FIGURE 5 Cell
FIGURE 5Cell-type-group-specific enrichment of SNP heritability for seven autoimmune diseases using stratified linkage disequilibrium score regression (S-LDSC).The x-axis represents cell types, y-axis represents the log-transformed P-value of coefficient Z scores.Annotations with statistical significance after Bonferroni corrections (P<0.05/10) were plotted in red, annotations with nominal significance (P<0.05) were plotted in orange, otherwise in blue.The horizontal grey dash line indicates P-threshold of 0.05 and horizontal red dash line indicates P-threshold of 0.05/10.

FIGURE 4
FIGURE 4Tissue type-specific enrichment of SNP heritability for seven autoimmune diseases using stratified linkage disequilibrium score regression (S-LDSC).The x-axis represents tissue type, y-axis represents the log-transformed P-value of coefficient Z scores.Annotations with statistical significance after Bonferroni corrections (P<0.05/53) were plotted in red, annotations with nominal significance (P<0.05) were plotted in orange, otherwise in blue.The horizontal grey dash line indicates P-threshold of 0.05; horizontal red dash line indicates P-threshold of 0.05/53.

TABLE 1
The pleiotropic genes identified by the MTaSPUsSet, aSPUs test and GWAS catalog.

TABLE 2
The significant enrichment of functional annotation for the pleiotropic genes using FUMA.Despite the strengths, several limitations should also be considered.First, although MTaSPUsSet identified some novel genes, it was not compared with other existing multivariate methods.Second, the common genetic mechanism underlying multiple ADs was explored only based on the genomic data, and further studies combined with other omics are needed.