Genetic and Expression Analysis of COPI Genes and Alzheimer’s Disease Susceptibility

Alzheimer’s disease (AD) is the most common neurodegenerative disease in the elderly and the leading cause of dementia in humans. Evidence shows that cellular trafficking and recycling machineries are associated with AD risk. A recent study found that the coat protein complex I (COPI)–dependent trafficking in vivo could significantly reduce amyloid plaques in the cortex and hippocampus of neurological in the AD mouse models and identified 12 single-nucleotide polymorphisms in COPI genes to be significantly associated with increased AD risk using 6,795 samples. Here, we used a large-scale GWAS dataset to investigate the potential association between the COPI genes and AD susceptibility by both SNP and gene-based tests. The results showed that only rs9898218 was associated with AD risk with P = 0.017. We further conducted an expression quantitative trait loci (eQTLs) analysis and found that rs9898218 G allele was associated with increased COPZ2 expression in cerebellar cortex with P = 0.0184. Importantly, the eQTLs analysis in whole blood further indicated that 11 of these 12 genetic variants could significantly regulate the expression of COPI genes. Hence, these findings may contribute to understand the association between COPI genes and AD susceptibility.


INTRODUCTION
Alzheimer's disease (AD) is the most common neurodegenerative disease in the elderly and the leading cause of dementia in humans (Jiang et al., 2017;Liu et al., 2017b). It is suggested that genetic risk factors could cause AD (Van Cauwenberghe et al., 2016). In recent years, kinds of methods have been used to detect the underlying AD genetic factors. For example, candidate gene studies have identified mutations in APP, PSEN1, and PSEN2 to be associated with autosomal dominant AD (Van Cauwenberghe et al., 2016). APOE has been reported to be associated with both earlyand late-onset AD (Van Cauwenberghe et al., 2016). Large-scale genome-wide association studies (GWASs) have identified several novel genetic risk loci in European population, and candidate gene studies have replicated these findings in other populations (Liu et al., 2012;Liu et al., 2013b;Liu et al., 2013c;Liu et al., 2013d;Liu et al., 2014a;Liu et al., 2014b;Liu et al., 2014c;Chen et al., 2015;Li et al., 2015;Shen et al., 2015;Zhang et al., 2015;Chang et al., 2016;Li et al., 2016;Liu and Jiang, 2016;Ma et al., 2016;Tan et al., 2016;Zhang et al., 2016b). Whole-genome sequencing has highlighted the role of TREM2 in AD Zhang et al., 2016a;Ulland and Colonna, 2018). However, these AD susceptibility loci could only explain 28.57% AD genetic risk (Cuyvers and Sleegers, 2016). Large proportion of AD heritability remains unclear.
In their study, Bettayeb et al. Bettayeb et al. (2016) selected six independent study cohorts including two family-based studies and four case-control association studies and further performed a meta-analysis using a total of 6,795 samples (4,018 AD cases and 2,777 controls). Until now, large-scale GWASs have been performed (Harold et al., 2009;Lambert et al., 2009;Holliday et al., 2012;Lambert et al., 2013;Jansen et al., 2019). Hence, we used a large-scale AD GWAS dataset to investigate the association of these 12 genetic variants and the COPI genes and AD risk by a single SNP test and a gene-based test (Lambert et al., 2013). Meanwhile, considering the unknown function of the significant SNP, we conducted an expression quantitative trait loci (eQTLs) analysis.

AD GWAS Dataset
We selected the AD GWAS dataset from the International Genomics of Alzheimer's Project (IGAP) (Lambert et al., 2013). International Genomics of Alzheimer's Project is a large two-stage study based on GWAS on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data on 7,055,881 SNPs for meta-analysis of four previously published GWAS datasets consisting of 17,008 Alzheimer's disease cases and 37,154 controls (European Alzheimer's disease Initiative, Alzheimer Disease Genetics Consortium, Cohorts for Heart and Aging Research in Genomic Epidemiology consortium, Genetic and Environmental Risk in AD consortium) (Lambert et al., 2013). All patients with AD satisfied the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association criteria or the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition guidelines (Lambert et al., 2013). Previous studies have provided more detailed information about IGAP dataset (Jiang et al., 2017;Liu et al., 2017a;Liu et al., 2018d).

SNP-Based Test
Here, we investigated the association between these 12 variants and AD susceptibility using the summary association results from the above study (Lambert et al., 2013). If one of these 12 variants is not available in the AD GWAS dataset, we used HaploReg (version 4) to identify the proxy SNPs based on the linkage disequilibrium (LD) information in 1000 Genomes Project (Ward and Kellis, 2012).

Gene-Based Test
We performed a gene-based test of this large-scale AD GWAS dataset using a common method PLINK (SET SCREEN TEST) (Moskvina et al., 2011). PLINK is a meta-analysis using all the SNPs in the corresponding genes (Moskvina et al., 2011). The method uses an approximate Fisher's test to combine P values across all the SNPs in genes and adjusts for LD (Moskvina et al., 2011). Meanwhile, we performed a gene-based test of this large-scale AD GWAS dataset using VEGAS (Liu et al., 2010). VEGAS software incorporates information from all SNPs within a gene and adjusts the gene sizes, SNP density, and the LD between SNPs (Liu et al., 2010). VEGAS assigns SNPs to 17,787 autosomal genes according to the positions of SNPs and genes (± 50 kb from the 5′ and 3′ UTR) (Liu et al., 2010). Previous studies have provided more detailed information about the PLINK and VEGAS methods (Liu et al., 2013a;Liu et al., 2017d;Li et al., 2018;Lang et al., 2019).

eQTLs Analysis
We selected the eQTLs dataset from the Brain eQTL Almanac (Braineac), which is a web-based resource to access the UK Brain Expression Consortium dataset (Ramasamy et al., 2014). This resource included 134 neuropathologically normal individuals of European descent in 10 brain tissues (Ramasamy et al., 2014). For each sample, we got his/her COPI gene expression data and the genotype data for 12 SNPs (Ramasamy et al., 2014). We then evaluated their association with nearby gene expression using a linear regression analysis under an additive model. In addition to normal human brain tissues, we further evaluated whether these genetic variants could regulate the expression of nearby genes in neurodegenerative disease tissues. We selected two eQTLs datasets from 197 AD cerebellar samples and 202 AD temporal cortex samples (Zou et al., 2012). The significance level is P < 0.05. Recent studies have provided more detailed information about the eQTLs using Braineac (Hu et al., 2017;Liu et al., 2017a;Liu et al., 2017c;Liu et al., 2018a;Liu et al., 2018b;Liu et al., 2018c;Zhang et al., 2018). In addition, we conducted an eQTLs analysis in whole blood using the large-scale dataset from the eQTLGen Consortium (Võsa et al., 2018). The consortium incorporates 37 datasets, with a total of 31,684 individuals (Võsa et al., 2018).

SNP-Based Test
Using SNP-based test, we found that 10 of the 12 SNPs were included in this GWAS dataset except rs7531886 and rs34280607 variants. We further applied HaploReg (version 4) to identify their proxy SNPs based on the LD information from the 1000 Genomes Project (EUR) (Ward and Kellis, 2012). We selected two best tagged SNPs including rs2298104 LD with rs7531886 (r 2 = 0.84 and D′ = 0.99), as well as rs34192202 LD with rs34280607 (r 2 = 0.69 and D′ = 0.91). The results indicated that among these 12 SNPs only rs9898218 showed significant association with AD risk with P = 0.017, as described in Table 1.

eQTLs Analysis
As provided in Table 1, rs9898218 is the only SNP associated with AD risk with P = 0.017. We first evaluated the association between rs9898218 and COPZ2 expression using the Braineac dataset. The results showed that rs9898218 G allele was associated with increased COPZ2 expression in cerebellar cortex (P = 1.84E−02), but not in other nine brain tissues ( Table 2).
In cerebellar cortex, rs2298104 and rs7531886 were associated with the expression of COPA, and the rs11650615 was associated with the expression of COPZ2. In occipital cortex, both rs12033011 and rs7531886 were associated with the expression of COPA. In putamen, rs12033011, rs2298104, and rs7531886 were associated with the expression of COPA (Table 3).
In two AD eQTLs datasets, 3 (rs11650615, rs9898218, and rs498872) of these 12 genetic variants are available. However, none of these three genetic variants was associated with the COPI gene expression in AD cerebellar and temporal cortex tissues ( Table 4).

DISCUSSION
In recent years, COPI genes have been reported to be potentially involved in AD . For example, a cluster analysis of microarray data indicated the association between COPA and AD (Guttula et al., 2012). Dynamic regulatory network reconstruction analysis showed gradually depressed activity of COPA (Kong et al., 2014). Bettayeb et al. (2016) highlighted 12 SNPs including rs7531886, rs12033011, rs72868007, rs73022058,  With the wide application of GWAS method in AD, it is possible and rapid to validate a finding using large-scale AD GWAS dataset. Here, we selected a large-scale AD GWAS dataset and performed both SNP and gene-based tests. We think that this large-scale dataset may be more powerful than the original dataset (Bettayeb et al., 2016). Using SNP-based test, the results showed that rs9898218 T allele could increase AD risk with β = 0.040 and P = 0.017. Two gene-based test methods indicated no significant association between these COPI genes and AD susceptibility. Interestingly, eQTLs analysis further showed that rs9898218 T allele could reduce COPZ2 expression in cerebellar cortex with β = −0.069 and P = 1.84E−02, but not in other nine brain tissues. Meanwhile, we identified other four genetic variants (rs12033011, rs2298104, rs7531886, and rs11650615) regulating the COPI gene expression in other human brain tissues. Importantly, the eQTLs analysis in whole blood further indicated that 11   COPZ2 encodes a member of the adaptor complexes small subunit family (Shtutman et al., 2011). Evidence showed down-regulated COPZ2 expression in most tumor cell lines and in individuals with kinds of cancer types (Shtutman et al., 2011). Interestingly, recent studies have highlighted the role of COPZ2 in AD (Ciryam et al., 2016;Wan Nasri et al., 2018). Wan Nasri et al. (2018) evaluated the effect of 6 months of tocotrienol rich fraction supplementation on gene expression in the hippocampus of wild-type group (n = 4) and APPswe/PS1dE9 double transgenic AD mice (n = 4). They found that Copz2 was significantly down-regulated in AD group compared with the WT group (P = 6.44E−05 and fold change = −4.5788) (Wan Nasri et al., 2018). Ciryam et al. (2016) conducted a meta-analysis of gene expression data from about 1,600 human central nervous system tissues to investigate the transcriptional changes upon aging and as a result of AD. They found that COPZ2 was up-regulated in AD (P = 3.90E−05 and fold change = 1.13).
In summary, these findings may provide important information about the association between COPI genes and AD susceptibility. Meanwhile, future studies are required to replicate these findings using large-scale GWAS and eQTLs datasets.

DATA AVAILABILITY
Publicly available datasets were analyzed in this study. These data can be found here: http://web.pasteur-lille.fr/en/recherche/u744/ igap/igap_download.php.

AUTHOR CONTRIBUTIONS
LS and HZ proposed the project. YY collected and analyzed the data. All authors wrote the manuscript and approved the final version of the manuscript.