Heritability Enrichment of Immunoglobulin G N-Glycosylation in Specific Tissues

Genome-wide association studies (GWAS) have identified over 60 genetic loci associated with immunoglobulin G (IgG) N-glycosylation; however, the causal genes and their abundance in relevant tissues are uncertain. Leveraging data from GWAS summary statistics for 8,090 Europeans, and large-scale expression quantitative trait loci (eQTL) data from the genotype-tissue expression of 53 types of tissues (GTEx v7), we derived a linkage disequilibrium score for the specific expression of genes (LDSC-SEG) and conducted a transcriptome-wide association study (TWAS). We identified 55 gene associations whose predicted levels of expression were significantly associated with IgG N-glycosylation in 14 tissues. Three working scenarios, i.e., tissue-specific, pleiotropic, and coassociated, were observed for candidate genetic predisposition affecting IgG N-glycosylation traits. Furthermore, pathway enrichment showed several IgG N-glycosylation-related pathways, such as asparagine N-linked glycosylation, N-glycan biosynthesis and transport to the Golgi and subsequent modification. Through phenome-wide association studies (PheWAS), most genetic variants underlying TWAS hits were found to be correlated with health measures (height, waist-hip ratio, systolic blood pressure) and diseases, such as systemic lupus erythematosus, inflammatory bowel disease, and Parkinson’s disease, which are related to IgG N-glycosylation. Our study provides an atlas of genetic regulatory loci and their target genes within functionally relevant tissues, for further studies on the mechanisms of IgG N-glycosylation and its related diseases.


INTRODUCTION
Glycosylation is one of the most ubiquitous and essential posttranslational modifications (PTM) for extracellular proteins in eukaryotes, with the addition of linear or branched oligosaccharide sidechains called glycans to the backbones of proteins (1). According to the glycans covalently attached to asparagine, threonine, or serine side chains, they are named either "N-linked" or "O-linked" (2). Based on the well-known asparagine (Asn)-X-Serine (Ser)/threonine (Thr) sequon, a given eukaryotic glycoprotein may have one or more N-linked glycosylation (N-glycosylation) sites (3). In terms of the relatively clear functional domains and the highly conserved glycosylation site at the equivalent position of Asn-297 of each heavy chain across mammalian species, immunoglobulin G (IgG) has been regarded as an ideal N-glycoprotein model for researching N-glycosylation (4,5).
N-Glycan is initially synthesized from a lipid-linked, oligosaccharide moiety (Glc3Man9GlcNAc2-P-P-dol) on the lumen side of the endoplasmic reticulum (ER) and transferred to the nascent polypeptide chains in the ER. N-glycan is then conservatively trimmed to a core moiety (Man5GlcNAc2-Asn) by a series of exoglycosidases in the ER before transfer to the Golgi apparatus for the following optional glycan assembly (6). Assembly of the glycan-extended tree is controlled by multiple exoglycosidases and the Golgi-localized glycosyltransferases, resulting in a wide variety of oligosaccharide structures showing high species specificity (7). At present, almost 200 glycosylationrelated genes have been identified in the human genome (summarized in GlycoGene Database (GGDB, https://acgg.asia/ ggdb2/) (8), representing approximately 1% of all human genes. However, glycan branching in the Golgi is highly dependent on microenvironment, such as tissue-specific regulation of the expression of glycoenzymes along the Golgi assembly line. Due to the lack of N-glycan profiling data for particular tissues, even to the best-known glycoprotein, human IgG, it remains unclear whether its N-glycosylation is regulated differentially across multiple tissues and how tissue-specific regulation contributes to its diverse N-glycosylation.
GWAS have identified over 60 susceptibility loci associated with the alternative N-glycan peaks (N-GPs) of IgG, which is the qualification and quantification of enzymatically released Nglycans by ultra-performance liquid chromatography (UPLC) after the IgG is isolated from plasma (9)(10)(11)(12). Four of the 200 glycogenes (8) are located in these identified GWAS loci, including FUT6, FUT8, B4GALT1, and MGAT3, implying their contribution to the alternative IgG N-glycosylation. However, over 90% of identified GWAS hits are difficult to characterize biologically due to the pitfalls of GWAS approach, e.g., very small effect size, within the noncoding region, pleiotropic, and/or noncausative (13). Thus, a large number of functionally relevant genes underpinning these GWAS associations of IgG N-glycosylation remain unidentified.
Furthermore, immune cells, e.g., plasma cells which synthesize and secrete IgG, are highly motile between blood and lymphatic circulation, traveling around the lymphoid nodes and mucosa-associated lymphoid tissues (MALTs), a diffuse lymphoid tissue system found in submucosal parts of the body (e.g., gastrointestinal tract, nasopharynx, thyroid, breast, lung, salivary glands, and skin), throughout the body to reach a site of inflammation (14). On amount of the existence of tissue-specific gene expression (15) and the limitation that only plasma IgG has been investigated in population-based studies for the genetic effect of IgG N-glycosylation, it is still unclear whether or not the N-glycosylation of IgG is regulated differentially among multiple MALTs. Likewise, how the genetic susceptibility of quantitative trait loci (QTL) identified by GWAS affects IgG N-glycosylation through the tissue-specific regulation of gene expression remains unknown. Recent genomic/transcriptomic-based statistical approaches (16,17) may help to shed light on the complicated mechanisms of IgG N-glycan biosynthesis, especially concerning tissue-specific regulation.
In the present study, to identify genetically regulated genes associated with IgG N-glycosylation traits across the multitude of tissues, we leveraged the data of GWAS on IgG N-glycosylation from 8,090 participants of European ancestry (11) and the data from a large-scale expression QTL (eQTL) study, i.e., Genotype-Tissue Expression of 53 types of tissue (GTEx v7) (18). We first conducted a linkage disequilibrium scores for the specific expression of genes (LDSC-SEG) (16) to filtrate which tissues are most likely to be enriched with each specific IgG N-GPs having significant GWAS results (20 out of 23 GPs have significant GWAS hits). To avoid the tissue bias in following transcriptome-wide association study (TWAS), we selected corresponding tissue types in expression panels of functional summary-based imputation (FUSION), matched with the LDSC-SEG screening tissues and the LDSC-SEG significant IgG N-GPs for TWAS analyses (17). To investigate whether the significance of TWAS hits resulted from the regulation of gene expression or a genetically associated effect, we conducted joint and conditional analyses on each TWAS hit. We next explored the biological pathways of candidate genes from TWAS and accessed the network of corresponding gene sets by protein-protein interaction (PPI) analysis within IgG Nglycosylation-related tissues. At last, we retrieved the single nucleotide polymorphisms (SNPs) underlying the TWAS hits in phenome databases to discover the complex traits and diseases sharing genetic susceptibility with IgG N-glycosylation. A schematic analysis plan of our study can be found in Figure 1.
The study aimed to characterize the genetic predispositions of IgG N-glycosylation in their enriched tissues, thus extending our understanding of the genetic regulation of IgG N-glycosylationrelated gene expression in the corresponding tissues and to determine their association with susceptibility genes for IgG Nglycosylation-related traits and diseases.
To be consistent with our previous studies, we used the Zagreb code (GP1-GP24) for naming each GP profiled by UPLC in the IgG N-glycome. Detailed naming and compositional information of GPs were given in a previous report (9,20) and are listed in Supplementary Table 2. Meanwhile, being consistent with public FIGURE 1 | A schematic analysis plan in the study. Leveraging IgG N-glycosylation genome-wide association studies (GWAS), summary statistics, and gene expression datasets: (1) to filtrate the tissues enriched in IgG N-glycosylation signal, (2) to identify the genes most significantly associated with IgG N-glycosylation features, and (3) to investigate the diseases or phenotypes involved with IgG N-glycosylation. data from GTEx, we used the same tissue names as those in GTEx. Of the total 24 GPs, significant GWAS summary statistics of 20 GPs were investigated by LDSC-SEG. Whereas the remaining four GPs (GP1, GP3, GP5, and GP21) were excluded due to a lack of GWAS statistical significance.
Transcriptomic Dataset for Linkage Disequilibrium Score Regression of Specifically Expressed Genes in LDSC-SEG Integrating GWAS with large-scale functional genomic data has been proposed as an effective approach to characterize the functional effects of associated genetic variants, especially for cross-tissue study (21). LDSC-SEG provides a solid bioinformatics tool for discerning which tissues or cell types are most relevant to a particular disease or health phenotype (16). Therefore, LDSC-SEG is able to perform the tissue enrichment analyses for a specific phenotype by integrating stratified linkage disequilibrium score regression from GWAS summary statistics with tissue-specifically expressed gene sets in a huge volume of gene expression data (22). Data for LDSD-SEG analysis were prepared as described in a previous study (https://alkesgroup.broadinstitute.org/ LDSCORE/) (16). A total of 53 multitissue samples from GTEx v7 were included in this LDSC-SEG analysis (https://github.com/ bulik/ldsc) (18).
Through TWAS analysis, the relationships between SNPs and gene expression levels were first obtained to build-up reference panels composed of predictive models. These models were then used to predict trait-associated gene expressions, via the statistically significant SNPs from the GWAS summary statistics of an interesting trait based on a large independent cohort (17,25).
The FUSION method was performed to estimate heritability, build predictive models, and identify transcriptome-wide associations. By FUSION, the associations between the IgG N-GPs and the expression levels of candidate genes in corresponding tissues were identified via the coassociated genetic variants (as SNPs) which are identified as statistically significant in GWAS (as QTLs) and also in GTEx (as eQTLs).
Transcriptomic imputation (TI) in FUSION method was conducted using eQTL reference panels which were derived from tissue-specific gene expression coupled with genotypic data. In the current study, 27 tissues were identified as relevant with IgG N-glycosylation by LDSC-SEG, while three tissues were unavailable in FUSION. Hence, 24 tissue panels were used as TI reference panels, while the 1,000 Genomes v3 LD panel (http:// ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) was hired as an LD reference. A Bonferroni-corrected study-wise threshold was calculated by p = 0.05/number of genes in each panel (Supplementary Table 2). As previous GWAS on IgG N-GPs reported that immune tissues are relevant to IgG Nglycosylation (10, 11), we employed three immune tissues, i.e., spleen, whole blood, and Epstein-Barr virus (EBV)-transformed lymphocytes as a complementary strategy of expression panel selection in FUSION, along with 20 GWAS-significant IgG N-GPs to conduct a parallel TWAS analysis.

Identification of IgG N-Glycosylation Relevant Tissues by LDSC-SEG
The linkage disequilibrium score for the specific expression of genes (LDSC-SEG) is a computational approach to identify phenotype-relevant tissues using stratified LD score regression (https://alkesgroup.broadinstitute.org/LDSCORE/) (16). In a given tissue, if there is an enrichment of the highest specific expression in the regions surrounding the heritability of a disease/phenotype, it will support the likely correlation between this disease/phenotype and this tissue (16). By this approach, we investigated 53 tissues from the GTEx project (http://gtexportal.org/) (18), to designate the tissues relevant to IgG N-glycosylation features. All procedures follow the tutorial in Github (https://github.com/bulik/ldsc).

Performing TWAS on IgG N-Glycosylation GWAS Dataset
TWAS has been proposed as a robust tool to integrate GWAS summary statistics, cis-SNP-expression effect sizes, and LD reference panels to evaluate the association between the cisgenetic element of expression and disease/phenotype (28,30,31). Thus, using the colocalized SNPs between GWAS statistics and eQTL data as linkers, TWAS is able to identify the candidate genes for the potential mechanism underlying the variantdisease/phenotype associations, which are challenging for GWAS approach.
In the present study, we conducted FUSION tool (http:// gusevlab.org/projects/fusion/) (17) for each transcriptome reference panel designated by an LDSC-SEG approach. Briefly, to estimate the heritability of each gene expression, we first conducted a robust version of Genome-wide Complex Trait Analysis-Genome-based restricted maximum likelihood (GCTA-GREML). This step generated the heritability estimates of expression for each gene with the p of the likelihood ratio test.
FUSION has created five different models to calculate the predictive weights of expression or intron usage: best linear unbiased prediction (blup), Bayesian sparse linear mixed model (bslmm), LASSO regression (lasso), Elastic Net (enet), and top SNPs (topl). After weighting, the cross-validation for each of the desired models was performed. The model gaining the best cross-validation prediction accuracy was chosen and the corresponding predictive expression or intron usage was correlated to IgG N-glycosylation GWAS summary statistics to conduct TWAS and filtrate significant associations. The significance for heritability estimates of the genes or intron usage at a Bonferroni-corrected p < 0.05 were reported as TWAS hits. Accounting for more suggestive information on gene coexpression, we also applied an FDR of 5% within each expression reference panel to obtain a bigger risk gene set for pathway analysis.

Joint and Conditional Testing GWAS Signal Analysis
Using the postprocess module in FUSION (http://gusevlab.org/ projects/fusion/), joint and conditional testing methods were performed to determine the contribution of gene expression association in each significant TWAS hit. After the weight of gene expression was removed, the residual TWAS signal was recalculated and evaluated with genome-wide Bonferroni correction. The testing region was defined by the transcribed region of the genes. In each testing, every association of GWAS was conditioned upon the joint gene model by one SNP.

Colocalization Analyses and Functional Annotation
Using coloc approach (37), the colocalization analyses were conducted to strengthen the detection of candidate genes at IgG N-glycosylation GWAS loci by hunting the evidence of shared causal variants between functional eQTL traits and GWAS traits. The colocalization test converts correlation statistics to effect size based on the sample size of the study for a given function, i.e., gene expression. Then, under the assumption that the standard error approximation is inversely proportional to the square root of the sample size, the approximate colocalization effect size is calculated. The statistics of posterior probability (PP) were presented for the five hypotheses (PP.H0: unrelated; PP.H1: only functionally relevant; PP.H2: only GWAS relevant; PP.H3: independent function/GWAS related; and PP.H4: colocalized function/ GWAS related). The current study mainly concerned the PP.H4, which represents the posterior probability that GWAS significant signal and eQTL locus are the same locus, ranging from 0 to 1, where 0 means 0% probability and 1 means 100% probability. Colocalization is declared if the posterior PP.H4 for the model with a shared causal variant exceeded 0.750.

Phenome-Wide Association Studies and Genetic Correlation Investigation
To identify more diseases/phenotypes associated with the most significant eQTL of each TWAS gene, we performed a phenomewide association study (pheWAS) on each leading SNP, by leveraging the public data in the GWASAtlas (https://atlas.ctglab. nl) (41). Based on p-values, the top 5 phenotypes (excluding IgG N-glycosylation and repeated phenotypes) were presented. The genetic correlation between the diseases/phenotypes identified by pheWAS was determined by LDSC, using available data in the GWASAtlas. We utilized the most recent GWAS data (i.e., until 2020) for analysis. A Bonferroni correction was utilized to adjust the significance threshold with the number of tested traits.

Identification of IgG N-Glycosylation-Relevant Tissues by Heritability Enrichment of Expressed Genes
To better comprehend how peripheric tissues affect IgG N-GPs and to avoid tissue bias in following TWAS, we conducted a tissue enrichment analysis using the LDSC-SEG method to leverage the newest IgG N-glycosylation GWAS summary statistics (11) and eQTL data from the GTEx consortium (v7) (15,18). The study comprised 48 tissue types (n = 80 to 491, Supplementary Table 1) with a 5% false discovery rate (FDR) threshold.
Seventeen GPs were enriched in 27 of the 53 types of tissue in the gene expression panels of GTEx v7 at a 5% FDR threshold (-log 10 p > 1.32) (Supplementary Table 3 For the three immune tissues selected as the complementary strategy, only three GPs including FA2 [6]G1 glycan (GP8), FA2 [6] BG1 glycan (GP10), and FA2G2S1 glycan (GP18) were enriched in spleen and whole blood, but no enrichment was found in EBVtransformed lymphocytes (Supplementary Table 3). These results indicated that genetic effects of IgG N-GPs on the regulation of gene expression are tissue selective, and more likely to be enriched not only in the tissues of immune organs but also broadly in multiple MALTs across peripheral organs, such as skin, lung, breast mammary tissue, small intestine, esophagus muscularis, testis, and uterus.

Constructing a List of Tissue-Dependent Associations Between Genes and IgG N-Glycan Traits
To identify the cis-regulated genes associated with IgG N-GPs within their functionally relevant tissues, we conducted a TWAS analysis using the FUSION method (see URLs in the Data Availability Statement), in terms of the same IgG Nglycosylation GWAS summary statistics (11) and the reference transcriptome panels derived from GTEx v7 (15,18). In total, 116,076 features of tissue-specific gene expression were tested (see details in Supplementary Table 1). Based on the primary strategy of expression panel selection by LDSC-SEG (24 tissue reference panels out of 27 tissue types enriched by LESC-SEG are available in GTEx v7), 90 sets of IgG N-GP/SNP/gene expression association reached the threshold of Bonferroni-corrected significance within each tissue reference panel. After gene annotation by querying in the Metascape portal (42) (see URLs in the Data Availability Statement), 55 genes were confirmed as TWAS results, being significantly associated with 11 IgG N-GPs across 14 tissues ( Figure 3 and Table 1).
According to the three immune tissues selected by the complementary strategy, 22 statistically significant TWAS hits were obtained with the threshold of Bonferroni correction, nine of which were also identified in the 55 TWAS hits based on the primary strategy. The LDSC-SEG strategy obtained more information from candidate genes in the more functionally relevant tissues and covered the majority result from the three immune tissue strategy (OR is not estimable, p < 1.0E−05, Fisher's exact test). Furthermore, in terms of hiring the most mechanistically related tissue reference panels in TWAS analysis that could reduce the reference bias (21), we therefore chose the results from a primary strategy based on tissue enrichment as the final TWAS result for further analyses. Of the 55 candidate genes identified by TWAS, 12 genes in eight regions (B4GALT1: 9p21.1; COG7: 16p12.2; FUT8: 14q23.3; GPANK1: 6p21.33; GSDMB: 17q21.1; HLA-C: 6p21.33; HLA-DRA: 6p21.32; KDELR2: 7p22.1; MGAT3: 22q13.1; ORMDL3: 17q21.1; RPL3: 22q13.1; and TAB1: 22q13.1) are in known IgG N-glycosylation GWAS susceptibility loci (9-11). Forty-three genes at 10 novel regions and six known regions were for the first time associated with IgG N-glycosylation ( Table 1).
Based on the results from 27 IgG N-GP-enriched tissue types and 17 corresponding GPs, TWAS eventually identified 14 tissues significantly specific to 11 IgG N-GPs, and meanwhile the expressions of 55 genes were linked to their corresponding IgG N-GPs ( Figure 4). Checking for the IgG N-glycan traits based on the chemical and structural properties of glycans (detailed chemical and structural information of GPs was given in previous reports (9,20) and is presented in Supplementary  Table 2), all 14 types of specific tissue and 45 genes contributed to galactosylation, including eight tissues and 31 genes contributing to monogalactosylation, and 8 tissues and 17 genes contributing to digalactosylation; 13 of 14 tissues and 39 genes were identified as involved in fucosylation; 9 tissues and 30 genes affected sialylation, including 7 tissues and 28 genes to monosialylation, and 3 tissues and 3 genes to disialylation. Only 5 tissues and 14 genes were significantly associated with bisecting GlcNAc (Table 1 and Supplementary Table 2).
From the confirmed 90 sets of "GP-SNP-gene expression" associations, three scenarios of genetic variants affecting IgG Nglycosylation traits were observed: (1) A single SNP is within or associated with a single gene and a single GP within one corresponding tissue, showing the tissue-specific effect of the genetic variant for a certain GP. In this scenario, 40 associations were observed ( Table 1 SNPs with asterisks). (2) A single SNP is within or associated with a single gene or multiple genes and simultaneously associated with a single or multiple GPs within multiple corresponding tissues. This suggests pleiotropic effects of the genetic variants for different GPs or genes among multiple tissues (accounting for 44 observed associations, Table 1 SNPs with daggers). (3) A single SNP is associated or located in a single gene but associated with multiple GPs within only one tissue, indicating that the genetic variant is coassociated with these two GPs while they were highly correlated with each other via coassociation with the same gene (accounting for six observed associations, Table 1 SNPs with section symbols). For example, rs761830 is correlated with A2 glycan (GP2) (Z = −11.70, p = 1.34E−31) and FA2G2 glycan (GP14) (Z = 4.52, p = 6.22E−06), linked with FUT8 gene in skeletal muscle in two reverse directions. In such a case, the corresponding SNP is likely to regulate its target genes in two opposite directions to influence these two GPs ( Table 1).

IgG N-Glycosylation TWAS Hits Are Driven by Genetic Regulation on Gene Expression
As multiple TWAS hits overlapped with the significant results of previous IgG N-glycosylation GWAS, we conducted joint and conditional analyses to address how much GWAS signal remains after the association of the functional annotation is removed. The postprocess module in FUSION was performed to report the statistics for the jointly significant genes.
Our results demonstrated that all known IgG N-glycosylation GWAS susceptibility loci could be explained entirely or mostly by the expression of corresponding genes identified in TWAS, supporting the concept that these TWAS hits were mainly driven by the genetic regulation of gene expression in these loci (Supplementary Table 4 and Supplementary Figure 2). For instance, association analysis conditioning on the expression of FUT8, which depended on the associations between expression SNPs (eSNPs) and A2 glycan (GP2) in skeletal muscle, showed expression-driven signals in a previously implicated GWAS locus and explained 59.1% of the variance (rs761830 lead SNP GWAS p = 1.32E−54, conditional lead SNP GWAS p = 2.49E−23) ( Figure 5A). By the associations of eSNPs with FA2G2 glycan (GP14) in skeletal muscle, the expression level of FUT8 explained 94.7% of the variances at this locus (rs761830 lead SNP GWAS p = 6.83E−06, conditional lead SNP GWAS p = 3.00E−01) was observed ( Figure 5B). The joint and conditional analyses for WNT3 completely explained the variance of the locus by 100% on chromosome 17 through the associations of eSNPs with FA2[6]BG1 glycan (GP10) in the small intestine terminal ileum (rs199438 lead SNP GWAS p = 1.3E−07, conditional lead SNP GWAS p = 1) ( Figure 5C).
Similarly, we performed joint and conditional analyses on the remaining 43 novel IgG N-glycosylation TWAS hits with the expression of corresponding genes. The result showed that the majority of these TWAS hits were also mostly driven by the regulatory effect of genetic variants on the expression levels of targeted genes (Supplementary Figure 3 and Supplementary Table  5). For example, association analysis conditioning on the expression of EVI5 which depended on the associations between eQTLs and A2 glycan (GP2) in the skeletal muscle panel, demonstrated that expression-driven signals in this novel IgG N-glycosylation locus explained 49.9% of the variance (rs169201 lead SNP GWAS p = 5.17E −10, conditional lead SNP GWAS p = 1.09E10-05) ( Figure 5D).   Figure 5F).

Colocalization of TWAS Signals Provides Evidence of Causality
To strengthen the detection of candidate genes as potential causal genes, the colocalization analyses were conducted between total 90 TWAS hits and all corresponding GTEx v7 eQTLs (15), using "coloc" package (37). The colocalization results for this current study, the posterior probability for the fifth hypothesis (colocalized function/GWAS related), are given in the last column of Table 1. Using 0.750 as the cutoff value here, 38 out of a total 90 TWAS hits obtained significant possibilities for the colocalization shared between GWAS association and eQTL association ( Table 1). It is particularly important for the TWAS hits in the extended patterns of LD regions, such as HLA region on chromosome 6. Only 14 out of 40 TWAS hit in HLA region had the significant possibilities for colocalization, hinting that the extended patterns of LD regions are still challenging for TWAS approaches. Half of TWAS hits in nonextended patterns of LD regions were declared as shared causal variants with supportive posterior possibilities.

Functional Annotation
We used HaploReg v4.1 to perform the functional analysis of a total of 18 lead eQTLs showing the best colocalized associations with IGPs in both TWAS and GWAS results. Among them, rs2297256 was in the 3′-UTR, while rs2516412, rs2534680, rs3101018, rs761830, and rs1005522 were located in the upstream transcript regions, and the remaining 12 lay in the intronic regions (Supplementary Table 6).  FIGURE 4 | The atlas of TWAS statistically significant genes regulating IgG N-glycosylation within specific tissues. The left column lists the candidate genes of IgG N-glycosylation. The middle column comprises the tissues enriched to the candidate genes of IgG N-glycosylation. The various types of brain tissue from the GETx consortium (v7) were allocated to the "brain" tissue group to reduce FDR. The right column indicates the IgG N-glycosylation traits associated with candidate genes within relevant tissues. F, core (if the first letter) or antennary fucose; A2, biantennary; B, bisecting N-acetylglucosamine; Gx, galactose; Sx, sialic acid; x, number of galactoses or sialic acids in a glycan structure. Detailed naming and compositional information of GPs was given in previous reports (9,20) and is listed in Supplementary According to the data from the Encyclopedia of ENA Elements (ENCODE) project (43), 16 out of the total 18 lead eQTLs (except rs169201 and rs199520) were identified in strong promoter or/and enhancer activity regions; 15 of the total 18 (except rs1144709, rs169201, and rs199520) in DNAse hypersensitivity site regions; rs2516421, rs2534680, rs3101018, and rs17198191 in transcription factor-binding regions; 14 of the total 18 (except rs3130484, rs761830, rs9303281, and rs17689471) in the regulatory motifs (Supplementary Table 6).

Gene Set Enrichment and PPI Network Analysis
To investigate how many biological pathways may be potentially relevant with the genes identified by TWAS, enrichment analyses for GO, KEGG pathways, and Reactome were performed using the STRING database (see URLs in the Data Availability Statement). A total of 55 candidate genes identified by TWAS were significantly enriched in 42 gene sets focusing on three functional processes: glycosylation (10 gene sets), immune response (23 gene sets), and protein translation (9 gene sets) ( Table 2).
The ten significantly enriched glycosylation-relevant gene sets were mainly involved in the pathways of N-glycan biosynthesis, e.g., N-glycan biosynthesis, complex type (Mann-Whitney U test, p FIGURE 5 | Regional association of TWAS hits. (A) Chromosome 14 regional association plot for GP3 in skeletal muscle. (B) Chromosome 14 regional association plot for GP14 skeletal muscle. (C) Chromosome 17 regional association plot for GP10 in the small intestine terminal ileum. (D) Chromosome 1 regional association plot for GP3 in skeletal muscle. (E) Chromosome 5 regional association plot for GP3 in the small intestine terminal ileum. (F) Chromosome 21 regional association plot for GP19 in brain hypothalamus. The top panel in each plot highlights all genes in this 1-Mb window. The marginally significant genes identified by TWAS are colored in orange, and the jointly significant genes are highlighted in green. The bottom panel shows a Manhattan plot of the GWAS data before (grey) and after (blue) conditioning on the predicted expression of the green genes. rejection (p = 5.60E−04), graft-versus-host disease (p = 5.60E −04), and several specific immunological diseases, e.g., systemic lupus erythematosus (SLE) (p = 1.66E−04), autoimmune thyroid disease (AHD) (p = 1.82E−04), viral myocarditis (p = 2.50E−04), and leishmaniasis (p = 5.50E−04). At last, nine gene sets were related to protein translation, e.g., tRNA aminoacylation (p = 9.05E−05), amino acid activation (p = 1.44E−04), and tRNA metabolic process (p = 7.47E−03) ( Table 2). GO enrichment on TWAS hits strengthened several pathways which are biologically relevant to IgG N-glycosylation. The known glycosyltransferase genes (8) in our TWAS hits, including FUT8, B4GALT1, MGAT3, KDELR2, and COG7 were enriched in the processes of asparagine N-linked glycan biosynthesis, transport to the Golgi and subsequent modification, and N-glycan antennae elongation in the medial/trans-Golgi glycosylation. These processes have been implicated in the physioregulation of IgG Nglycosylation (44,45). By KEGG pathway analysis, six genes, i.e.,  Table 2), demonstrating their core functions in the pathways.
Through these core genes, these pathways are highly associated with cytokine-cytokine receptor interaction (hsa04060), antigen processing and presentation (hsa04612), T-cell receptor signaling pathway (hsa04660), B-cell receptor signaling pathway (hsa04662), and leukocyte trans-endothelial migration (hsa04670). These pathways play crucial roles in the appropriate functioning of all immunoglobulins, especially for the most abundant type in plasma, i.e., IgG. From Reactome (see URLs in the Data Availability Statement) enrichment, three gene sets are enriched into three Nglycosylation pathways, i.e., asparagine N-linked glycosylation (HSA-446203), transport to the Golgi and subsequent modification (HSA-948021), and N-glycan antennae elongation in the medial/trans-Golgi (HSA-975576).
To validate whether these TWAS-identified candidate genes associated with IgG N-glycosylation were inclined to be coexpressed within corresponding tissues, we measured the protein-protein interaction (PPI) network for the connectivity of the genes by GeneNetwork v2.0 (see URLs in the Data Availability Statement). The genes were clustered based on their coexpression of public RNA-seq data (n = 31,499). Most genes identified by TWAS within each specific tissue reference panel demonstrated coexpression (Supplementary Figure S4). For example, skeletal muscle contributed the most numerous significant TWAS hits across all tissues by 24 candidate genes, while 20 genes were clustered into one network with ERBB2, PGAP3, and PNMT demonstrating strong coexpression (Supplementary Figure 4). In small intestine terminal ileum, all 18 genes identified by TWAS show coexpression, in which PGAP3, GSDMB, HLA-S, CCHCR1, VARS2, and KANSL1 are intensively coexpressed (Supplementary Figure 4). Therefore, the coexpressed gene sets within corresponding tissues could support the result of IgG N-glycosylation TWAS analyses based on tissue enrichment by LDSC-SEG.

Phenome-Wide Association Study and Genetic Correlation Analysis
To identify other phenotypes which are likely to be associated or comorbid with IgG N-glycosylation, we conducted a pheWAS for each IgG N-glycosylation eQTL in the GWAS database based on a European population (41). Since all eQTLs are associated with IgG N-glycosylation, we chose to exclude and remove the duplicated phenotypes related to the same eQTL from the result list, in order to emphasize the remaining top 5 phenotypes ranked by p for each eQTL. In total, nearly 100 phenotypes were identified as significantly associated with these eQTLs, including anthropometric health measurements (weight, height, blood pressure, and blood cell count), immune and metabolic diseases (SLE, inflammatory bowel disease (IBD), rheumatoid arthritis (RA), primary sclerosing cholangitis (PSC), and type 2 diabetes (T2D)), and neurological and psychiatric disorders (Parkinson's disease (PD), schizophrenia, and bipolar disorder) (Supplementary Table 7).
To reconfirm the pheWAS results, we investigated the genetic correlations between these phenotypes using the most recent GWAS data from the UK Biobanks by Multi-marker Analysis of GenoMic Annotation (MAGMA), and SNP heritability, and genetic correlation with LDSC (41). Most of these diseaserelated phenotypes were implicated in previous GWAS studies (46)(47)(48)(49)(50)(51). By analyzing the genes assigned to each significant SNP within a 1-kb window from both sides with default parameters (SNP-wise mean model) (52) and the gene set defined by MSigDB v.6.1 (53), the MAGMA results showed strong correlations between height, waist-hip ratio (WHR), systolic blood pressure (SBP), PD, and IBD with IgG N-glycosylation (p < 2.5E−06) ( Figure 6A). Through the calculation of the SNP heritability and pairwise genetic correlations by LDSC (22), genetic correlations were calculated between the GWAS of disease-related phenotypes. The results showed strong positive correlations between IgG N-glycosylation, SBP, triglyceride cholesterol (TC), T2D, IBD, PSC, Crohn's disease (CD), and ulcerative colitis (UC), consistent with the above pheWAS results ( Figure 6B).

DISCUSSION
In this study, we conducted a systematic transcriptome-wide association study, combining the analysis of LDSC-SEG and TWAS to gain insight into the tissue specificity and tissuedependent genetic effect of IgG N-glycosylation, based on summary statistics of the most recent IgG N-glycosylation GWAS on 8,090 individuals of European ancestry. In so doing, we addressed fundamental questions regarding the heritable tissue enrichment of IgG N-glycosylation and constructed an atlas of tissue-dependent associations between genes and IgG Nglycan traits.   By the enrichment of 17 IgG N-GPs in 27 biologically relevant tissues in our study, we have obtained strong genetic evidence that most MALTs, e.g., skin, lung, breast mammary tissue, small intestine, esophagus muscularis, testis, and uterus, are enriched to specific IgG N-glycan traits ( Figure 2). It has been established that the IgG secreting cells (i.e., plasma cells) mature from B lymphocytes and are translocated from primary lymphoid tissue (bone marrow) to secondary lymphoid tissue which consists of the lymph nodes, spleen and MALT for secreting IgG and initiating adaptive immune responses (14). However, previous studies measured the N-glycosylation of IgG in humans mainly isolated from plasma, obtaining the average profile of whole IgG N-glycome, and therefore a lack of information about the functionally relevant tissues. A wet experiment on laboratory animals provided evidence to support the assumption that IgG against commensal gut bacteria can be synthesized and deposited locally within MALTs in organ-cultured pig small intestinal mucosal explants (54).
Using IgG as a model glycoprotein, our study demonstrates the evidence of genetically regulated gene expression for the tissue selectivity of protein N-glycosylation in eukaryotes. Eukaryotic N-glycosylation in the ER and Golgi apparatus is highly complicated, due to a variety of exoglycosidase and glycosyltransferase reactions. The tissue-selective manifestation of protein N-glycosylation has been observed in recent studies, e.g., between paired tumorigenic and adjacent nontumorigenic colon tissues in humans (55), and between sites within the same proteins from liver and brain tissues in the mouse (56). As a simple glycoprotein, IgG usually contains only one Nglycosylation site in the constant heavy chain region and the N-glycan moieties of IgG have no more than two antennae. But, in fact, hundreds of forms of glycans have been observed at this single N-glycosylation site. Due to the limitation of sampling from diverse human health tissues for N-glycosylation profiling, there is still a lack of evidence from the N-glycome level for the tissue specificity of IgG N-glycosylation. The results of tissue enrichment in our study have bridged the gap and exhibited tissue-selective manifestation for IgG N-glycosylation, leading to the next hypothesis that certain N-glycans of IgG may specifically be modified within some tissues and be affected by  tissue-specific environments. Consequently, the current study advances the knowledge of tissue specificity on human IgG Nglycosylation and in turn increases the statistical power for the following TWAS analysis of candidate gene prioritization (21). In this current study, we demonstrated that the TWAS approach is able to discover more IgG N-glycosylation-related genes without any prior information. We identified 12 candidate genes in eight regions are in known susceptibility loci reported by previous IgG N-glycosylation GWAS (9)(10)(11). Furthermore, we discovered 43 genes at 10 novel regions and six known regions for the first time to be associated with IgG N-glycosylation. For the known genome loci, TWAS discovered more candidate genes whose expressions in specific tissue are likely related to IgG Nglycosylation by integrating GWAS signals with eQTL knowledge, for example, C4A, C4B, CYP21A2, and GPANK1 in 6p21.33 for FA2G1S1 glycan (GP16) in skeletal muscle. In addition, TWAS explored candidate genes in novel genomic loci, e.g., EVI5 in 1p22.1 and PGAP3, ERBB2, PNMT, and GSDMA in 17q12 for A2 glycan (GP2) in skeletal muscle. These results provided more genomic context, including candidate genes, regulatory variants, and relevant tissues for future functional studies of IgG N-glycosylation.
A gene expressed specifically in a tissue type or cell type usually reflects the biological processes in which the gene is involved and its biological functions (57). In the current study, we demonstrate that associations along IgG N-GP/SNP/gene expression in a tissuespecific SNP scenario appear to be dependent on using expression data derived from N-GP-enriched tissue. For instance, FA2G1S1 glycan (GP16) is positively associated with the expression of PSORS1C2 in sun-exposed lower leg skin via a single eSNP rs1265099 (Z = 5.24, p = 1.65E−07) but is not associated with any other genes or tissues. PSORS1C2 encodes a keratinocyte cornification-associated protein, which is specifically expressed in two skin tissue types evaluated by GTEX v7 (Supplementary Figure  5). The protein product of this gene plays a primary role in the terminal differentiation of keratinocytes (58). Recent studies have shown that PSORS1C2 is strongly upregulated in peeling skin disease (59) and is also associated with autoimmune skin diseases including vitiligo (60) and psoriasis (61) by GWAS analyses. Also, aberrant IgG FA2G1S1 glycan (GP16) has been observed to be associated with SLE (62) and colorectal cancer (63). Our finding that this IgG FA2G1S1 glycan (GP16) genetic predisposition (rs1265099) may have tissue-specific effects on the expression of PSORS1C2 within skin tissue thus supports the hypothesis that skin tissue can contribute to the regulation of IgG FA2G1S1 glycan biosynthesis and/or have the potential to serve as a proxy for several skin-related autoimmune diseases. In total, four genes (PSORS1C2, HLA-DRA, APOM, and SAPCD1) were significant in sun-exposed lower leg skin TWAS models. Broadly, half of the significant genes show tissue specificity on IgG N-glycosylation in other TWAS tissue models. These tissue-specific candidate gene sets are a promising source for further investigations into genetic effects underlying the interactions between highly diverse IgG N-GPs and tissue-specific genetic expression within corresponding tissues.
In contrast, genetic variants that affect gene expression levels in multiple tissues are more likely to affect multiple complex traits (64). The pleiotropic gene findings in this study can confirm that IgG N-glycosylation TWAS genes expressed in multiple tissues are more likely to have a wide range of downstream phenotypic consequences (i.e., diverse N-glycosylation modification). As an example in the present study, 19 genes are identified as IgG Nglycosylation candidate genes in the small intestine terminal ileum with 10 pleiotropic and 9 tissue-specific genes. Within the ten pleiotropic genes, C4A, FLOT1, HLA-C, and WNT3 perform important roles in N-glycosylation-related biological processes, while the remaining nine tissue-specific genes are associated with intestine-related functions, e.g., FLOT1 is responsible for encoding flotillin 1 and is ubiquitously expressed across all tissue types evaluated by GTEx v7 (Supplementary Figure 6). Flotillin 1 localizes to the caveolae and plays a role in several super pathways, e.g., the angiopoietin-like protein 8 regulatory pathway, cytoskeletal signaling, regulation of lipid metabolism and beta-adrenergic signaling. As a tissue-specific gene, GSDMB was reported to be expressed exclusively in the epithelium of the gastrointestinal tract in a highly tissue-specific manner (65). SLC22A5 was reported as being responsible for carnitine transport across apical membranes of intestinal epithelial cells (66). Pleiotropic genes and eSNPs account for almost half of the other significant TWAS hits. Most of these hits have previously been reported in the three core N-glycosylation-related biological processes, including N-glycosylation, immune response, and protein translation, while tissue-specific genes are mainly annotated as being involved in the physiological activities of corresponding tissues. This result partially explains why plasma IgG N-glycosylation demonstrates a relatively stable holistic pattern on each monosaccharide glycan although based on such complicated branching patterns. The pleiotropic candidate genes maintain the primary patterns of N-glycosylation in most related tissues for a house-keeping N-glycosylated level, whereas the tissue-dependent candidate genes regulate tissue-specific IgG Nglycan patterns to adopt local inflammation, maintaining homeostasis during physiology and pathophysiology processes.
By the evidence of the coassociations among eSNP and IgG N-GPs, our results shed considerable light on the regulatory mechanism of the equilibrium between some pairs of IgG Nglycosylation. For example, in terms of Z-score in TWAS results, rs761830 simultaneously correlated with A2 glycan (GP2) (Z = −11.70, p = 1.34E−31) and FA2G2 glycan (GP14) (Z = 4.52, p = 6.22E−06) and linked them with FUT8 gene in skeletal muscle in two reverse directions (Supplementary Table 2 and Figure 2). This linked association provides two layers of meaning. Firstly, in IgG N-glycosylation GWAS statistics, the effect allele (rs761830-A) at the FUT8 locus is associated with a decreased level of GP2 (rs761830, p = 4.08E−38) but an increased level of GP14 (rs761830, p = 6.83E−06). The finding indicates that these two GPs share identical heritability, being associated antagonistically in the IgG N-glycome. Conversely, increasing A2 glycan (GP2) and decreasing FA2G2 glycan (GP14) are consistent with the changing of certain IgG N-GPs which have been observed in association studies on chronic diseases (Supplementary Table 8), such as dyslipidemia (DL) (67), SLE (62), RA (68), and chronic kidney disease (CKD) (69). Secondly, eQTL statistics of GTEx v7 shows the effect allele rs761830-A is responsible for increased changes in gene expression of FUT8 in skeletal muscle (rs761830, p = 2.60E−06). Hence, the genetic predisposition tagged by SNP rs761830 whose allelic regulation on the expression of FUT8 gene in skeletal muscle may contribute to the equilibrium of IgG N-GP regulations, such as the above-mentioned two types of N-GPs.
By GO pathway enrichment and pheWAS investigation, our study provides additional evidence for the involvement of IgG Nglycosylation in several IgG N-GP-related diseases. Since the establishment of UPLC-based high-throughput IgG N-glycomic profiling (70), certain specific IgG N-GPs differed significantly from the average patterns and were indicative of clinical conditions, such as the level of monosialylated glycans which was most strongly correlated with MetS-related risk factors, especially with systolic blood pressure (SBP) (71). Further research in other independent cohorts confirmed the existence of a similar relationship between selective IgG N-glycan patterns and signs of autoimmunity, e.g., IBD (49,72), SLE (62), and RA (73), neurological disorders, e.g., Parkinson's disease (4) and dementia (74), and vascular diseases, e.g., ischemic stroke (75) and cancers (63), as previously mentioned (Supplementary Table S7). The consistency of candidate genes, combined with the coexpression of IgG N-glycosylation-related genes within multiple tissues, GO enrichment gene sets, and the pheWAS investigation on TWAS eSNPs implies that IgG N-glycosylation may be the mediated phenotype sharing genetic correlations with its related diseases. The allelic regulatory genetic variants of IgG N-glycosylation within certain tissues may be involved in the pathophysiological processes of its related diseases via regulation of their target genes in tissue-specific modes.
Efforts to generate ever-larger sets of tissue-specific genetic effects data will facilitate data mining opportunities for investigating the regulatory mechanisms underlying trait and gene associations (76). Although the current study has been restricted to the use of eQTL data and GWAS data generated from independent subpopulations of European ancestry with limited sample sizes, future functional genomic endeavors of tissue-specific regulation on IgG N-glycosylation will benefit from a larger replication cohort simultaneously having IgG Nglycosylation features, genotype data, and gene expression profiles from different tissues. A recent study showed that targeted sialic glycan degradation reinforces the anticancer immune response in an animal experiment as evidenced by the sialylation effect on the surface of cancer cells (77). The construction of in vitro and in vivo models for IgG N-glycosylation are therefore urgently needed for providing more solid experiment-based evidence to confirm the hypothesis generated from the current study. We also note that our study utilized eQTLs as genetic predictors of gene expression. Nevertheless, pleiotropic effects on both gene expression and phenotype could not be ruled out without further analyses. Probabilistic fine-mapping approaches and transcription factor enrichment analyses (78)(79)(80) can positively exclude spurious results. Furthermore, a gene may have regulatory characteristics other than cis-regulation, e.g., trans-regulation, or that does not pass through eQTLs, but still could have downstream effects on expression (81). In theory, beyond the current knowledge of regulatory models, there could be other novel regulatory elements and underlining mechanisms. However, based on allelic cis-regulation, we have been able to successfully prioritize a potential causal gene set of 55 genes for IgG N-glycosylation and its related complex traits and diseases.
In summary, we have performed the first comprehensive analysis so far to identify the most relevant tissues for IgG Nglycosylation and to detect a large number of genetic variants which may regulate their target genes and further contribute to IgG N-glycosylation within corresponding tissues. This knowledge provides a starting point for further mechanistic work on IgG N-glycosylation, which would advance our understanding of IgG N-glycosylation biology and guide the design of future functional studies to explore the specific variants and the heritable regulation of IgG N-glycosylation. More importantly, our approach of combining LDSC-SEG and TWAS is widely applicable to complex traits for which GWAS summary statistics are available and helps our understanding of the molecular basis of the trait at multiple omics levels.

AUTHOR CONTRIBUTIONS
WW conceived the design of this study. XL, HW, and YZ performed the statistical and computational analyses. WC, HH, and MS assisted with data management. WW and XL wrote the manuscript. YW, ML, XG, XT, and JH supervised the project and oversaw the manuscript. All authors contributed to the article and approved the submitted version.