The Effect of Alzheimer’s Disease-Associated Genetic Variants on Longevity

Human longevity is influenced by the genetic risk of age-related diseases. As Alzheimer’s disease (AD) represents a common condition at old age, an interplay between genetic factors affecting AD and longevity is expected. We explored this interplay by studying the prevalence of AD-associated single-nucleotide-polymorphisms (SNPs) in cognitively healthy centenarians, and replicated findings in a parental-longevity GWAS. We found that 28/38 SNPs that increased AD-risk also associated with lower odds of longevity. For each SNP, we express the imbalance between AD- and longevity-risk as an effect-size distribution. Based on these distributions, we grouped the SNPs in three groups: 17 SNPs increased AD-risk more than they decreased longevity-risk, and were enriched for β-amyloid metabolism and immune signaling; 11 variants reported a larger longevity-effect compared to their AD-effect, were enriched for endocytosis/immune-signaling, and were previously associated with other age-related diseases. Unexpectedly, 10 variants associated with an increased risk of AD and higher odds of longevity. Altogether, we show that different AD-associated SNPs have different effects on longevity, including SNPs that may confer general neuro-protective functions against AD and other age-related diseases.


Genotyping and imputation
Genetic variants in our populations were determined by standard genotyping and imputation methods, and we applied established quality control methods: we genotyped all individuals with the Illumina Global Screening Array (GSAsharedCUSTOM_20018389_A2) and excluded individuals with low-quality genotypes (individual call rate <98%, variant call rate <98%), individuals with sex mismatches and variants deviating from Hardy-Weinberg equilibrium (p<1x10 -6 ). Genotypes were prepared for imputation comparing variants identifiers, strand and allele frequencies to the Haplotype Reference Panel (HRC v1.1, April 2016), and all remaining variants were submitted to the Sanger imputation server (https://imputation.sanger.ac.uk).
[53] The server uses EAGLE2 (v2.0.5) to phase the data, and imputation to the reference panel was performed with PBWT. [54,55] Before analysis, we excluded individuals of non-European ancestry ≤ 500 , increasing by 50 until at least 1 gene was found). Our procedure allows the association of each variant with one or multiple genes ( Figure S2).

Gene-pathway mapping
The resulting list of genes was used to find the molecular pathways enriched in the AD variants. See Figure S2 for a schematic representation of our annotation framework. We realized that allowing multiple genes to associate with each variant could result in an enrichment bias, as neighboring genes are often functionally related. To control this, we implemented a sampling technique: at each iteration, we (i) sampled one gene from the pool of genes associated with each variant, and (ii) performed a gene-set enrichment analysis with the resulting list of genes. The gene-set enrichment analysis was performed considering biological processes (BP) and implemented with the enrichGO function of the R package clusterProfiler, with all genes as background and correcting p-values controlling the False Discovery Rate (FDR). Finally, we averaged p-values for each enriched term over the iterations (N=1,000). To facilitate interpretation, we merged significantly enriched biological processes. First, we calculated the semantic similarity between all significant biological processes (i.e. FDR<5%) using Lin as a distance measure.
[59] We then applied hierarchical clustering on the resulting distance matrix and selected the number of functional clusters using the dynamic tree-cut method as implemented in cutreeDynamic function from the R package WGCNA, specifying 15 as the minimum number of terms per cluster (using the default value of 20 resulted in 2 functional clusters only). To provide an interpretation of each functional cluster, we selected the most frequent words describing the biological processes underlying each cluster, and show this as word-clouds as implemented in R package wordcloud2. Finally, by counting how often a functional cluster was associated with a gene, we could calculate a weighted annotation of each gene to the 4 functional clusters, so-called gene-pathway mapping ( Figure S2). The variant-gene mapping as well as the gene-pathway mapping procedures were performed using the web-server application available at https://snpxplorer.net.
[24] Due to the initial selection of significantly enriched BP, not every gene in the list of variant-associated genes is annotated with (at least one of) these terms. Consequently, these genes could not be related to the final functional clusters. To overcome this, we connect these genes to the functional clusters using a k-nearest neighbor (k-NN) imputation. The k-NN model was initially trained using the functional clusters as classes and the semantic similarity matrix between the enriched biological processes as features (feature terms). Then, for each gene with missing annotation, we (i) extracted all the biological processes the gene is involved in (input biological processes), and (ii) calculated the semantic similarity matrix between these terms and the feature terms, which defines the similarity between the input biological processes and the feature terms. Finally, we (iii) predicted the probability of classification of the similarity matrix to the classes (functional clusters), and used this as weight for the gene-pathway mapping ( Figure S2).

Variant-pathway mapping
The variant-pathway mapping represents the combined annotation of each variant to the different functional clusters. As such, it depends on the variant-gene mapping and the gene-pathways mapping. Briefly, given a variant , we (i) retrieved all the genes that were associated with the variant in the variant-gene mapping, ! , and (ii) retrieved all the biological processes (gene ontology term identifiers) that were associated with these genes, " . Because we clustered biological processes into functional clusters, by looking at which functional clusters the " belonged to, we could assign a weight of association for variant to each of the functional clusters.

Variant-cell-type mapping
To study brain-specific cell-types and their relationship with AD-associated variants, we used the publicly available gene   Dendrogram of the hierarchical clustering analysis and the 4 functional clusters, along with word-clouds of the most frequent terms per cluster. Hierarchical clustering was performed on the semantic similarity distance matrix (using Lin as semantic similarity metric). We used the dynamic tree-cut method to define the number of functional clusters, specifying 15 as the minimum number of terms per cluster. We then used word-cloud visualization as well as manual interpretation of the biological processes underlying each functional cluster to label each cluster to Lipid/Cholesterol metabolism (cluster 1), -Amyloid metabolism (cluster 2), Synaptic plasticity (cluster 3) and Endocytosis/Immune signaling (cluster 4).