Transcriptomic Changes Highly Similar to Alzheimer’s Disease Are Observed in a Subpopulation of Individuals During Normal Brain Aging

Aging is a major risk factor for late-onset Alzheimer’s disease (LOAD). How aging contributes to the development of LOAD remains elusive. In this study, we examined multiple large-scale transcriptomic datasets from both normal aging and LOAD brains to understand the molecular interconnection between aging and LOAD. We found that shared gene expression changes between aging and LOAD are mostly seen in the hippocampal and several cortical regions. In the hippocampus, the expression of phosphoprotein, alternative splicing and cytoskeleton genes are commonly changed in both aging and AD, while synapse, ion transport, and synaptic vesicle genes are commonly down-regulated. Aging-specific changes are associated with acetylation and methylation, while LOAD-specific changes are more related to glycoprotein (both up- and down-regulations), inflammatory response (up-regulation), myelin sheath and lipoprotein (down-regulation). We also found that normal aging brain transcriptomes from relatively young donors (45–70 years old) clustered into several subgroups and some subgroups showed gene expression changes highly similar to those seen in LOAD brains. Using brain transcriptomic datasets from another cohort of older individuals (>70 years), we found that samples from cognitively normal older individuals clustered with the “healthy aging” subgroup while AD samples mainly clustered with the “AD similar” subgroups. This may imply that individuals in the healthy aging subgroup will likely remain cognitively normal when they become older and vice versa. In summary, our results suggest that on the transcriptome level, aging and LOAD have strong interconnections in some brain regions in a subpopulation of cognitively normal aging individuals. This supports the theory that the initiation of LOAD occurs decades earlier than the manifestation of clinical phenotype and it may be essential to closely study the “normal brain aging” to identify the very early molecular events that may lead to LOAD development.


INTRODUCTION
Alzheimer's disease (AD) is the most common cause of dementia and about 6.2 million Americans live with the disease based on the Alzheimer's Association 2021 report (2021). Aging is the major risk factor for late-onset AD (LOAD), which occurs at age 65 or older (Caselli et al., 2009) and represents over 95% of the AD cases. A study analyzing 1,246 subjects aged 30-95 years found that the risk of developing AD dramatically increases in APOE ε4 carriers who are 70 years or older (Jack et al., 2015). It has been well-recognized that normal brain aging and LOAD share multiple common features, e.g., aging brains often manifest certain degrees of cognitive impairment, memory loss, metabolic disturbances, bioenergetic deficits, and inflammation. Even though aging increases the risk of AD and the two processes share similarities in multiple aspects, a detailed brain-regionspecific view of their interconnection at the molecular level is not fully available. It is also unclear which aging mechanisms are playing major contributions to AD development and why some individuals may age without major cognitive deficits while others develop AD (Koivisto et al., 1995;Herrup, 2010). To help address these issues, we performed a global comparison of the transcriptomes from normal aging and LOAD brains across multiple regions in a hope to gain new insights into the molecular interconnection between aging and AD.
Despite the fact that many transcriptomic studies have been performed to investigate aging and LOAD independently, only a few have compared normal aging and LOAD transcriptomic datasets in a systematic way (Cribbs et al., 2012;Berchtold et al., 2013;Mastroeni et al., 2017;Lanke et al., 2018). For example, Berchtold et al. (2013) used microarray to profile 81 aging and AD brains and found that synapse-related genes showed progressive down-regulation in both aging and AD. Using the same dataset, Lanke et al. (2018) performed an integrated analysis on young, aging and AD brains by constructing coexpression network models, and found that modules associated with astrocytes, endothelial cells and microglial cells were upregulated while modules associated with neurons, mitochondria and endoplasmic reticulum were down-regulated, and these modules significantly correlated with both AD and aging. All these studies greatly helped our understanding of the interconnections between aging and AD; however, the previous studies were limited in the brain regions examined and more importantly, they all treated aging and AD samples as uniform groups, and did not consider the possible heterogeneity in either aging or AD.
In this study, we compared the gene expression profiles of normal brain aging (age ≤ 70) with LOAD (age ≥ 60) (Yang et al., 2015;Hodes and Buckholtz, 2016) to understand the similarity and difference between aging and AD across multiple brain regions. We also considered brain aging subgroups and compared the aging subgroups with LOAD.

Data Collection and Pre-processing
We compiled and processed multiple large-scale human brain aging and AD gene expression datasets. We summarize and describe each dataset below and list all the data we studied in Table 1.

Genotype-Tissue Expression Brain Data
Genotype-tissue expression (GTEx) brain gene expression data (v7 and v8) from 13 brain regions were downloaded from the GTEx (genotype-tissue expression) portal (Consortium, 2015) and NIH dbGaP database. Donor ages ranged between 20 and 70 years and we removed samples from donors annotated with any brain diseases from further analysis. We corrected sex, collection center (batch), RIN (RNA Integrity Number), PMI (postmortem interval), and top 3 genotype principal components (PCs) to calculate gene expression associated with donors' chronological age.

Mount Sinai Brain Data
Three hundred and sixty-four human brains (238 females and 126 males) were accessed from the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB-Mount Sinai NIH Neurobiobank) cohort. These samples represented the full spectrum of cognitive and neuropathological disease severity in the absence of discernable non-AD neuropathology. Donor ages for the samples ranged 61-108. For microarray profiling of 19 brain regions, a linear model was adopted to identify genes differentially expressed among different disease stage groups using R package Limma with default parameters and corrected for covariates including sex, postmortem interval (PMI), pH, and race (Wang et al., 2016). For each brain regions, DEGs (Differential Expression Genes) with FDR ≤ 0.05 in any of the 6 traits (CDR, Braak, CERAD, PLQ_Mn, NPrSum, NTrSum in contrast of High vs. Low, Low vs. Normal or High vs. Normal) were combined as the DEGs for that brain region (see Supplementary Table 1) (Wang et al., 2016).

Other Brain Data
The AMP-AD knowledge portal (Hodes and Buckholtz, 2016)  Jager Alzheimer's Disease Gene Signatures Mostafavi et al. (2018) performed analyses on 478 ROSMAP dorsal lateral prefrontal cortex (DLPFC) tissue samples. Five gene lists were considered containing genes whose expression was associated with AD-related traits including clinical diagnosis of AD at the time of death, cognitive decline, tau, amyloid, and pathologic diagnosis of AD.

Differential Expression Analysis and Age-Associated Gene Expression Identification
Differential expression (DE) analysis in aging and AD was performed using the R package edgeR and Limma (Robinson et al., 2010;Law et al., 2014) and we adjusted batch, RIN, sex, and PMI in GTEx data and brain source, gender, batch effects and PMI in UK data. A linear regression model was applied to identify gene expression changes associated with age and we adjusted the same covariates as we did in the DE analysis (Yang et al., 2015;Zeng et al., 2020).

Subgroup Identification Using Hierarchical Clustering
We used hierarchical clustering method to identify subgroups in normal aging GTEx and UK brain samples. We selected the top 5,000 most variable genes which showed similar clustering result with using all genes (the adjusted Rand index > 0.9) for the hierarchical clustering of GTEx data. Ward.D2 method in the R hclust function was used (Murtagh and Legendre, 2014).

Deconvolution of Genotype-Tissue Expression Bulk Tissue Gene Expression Data to Infer Cell-Type Composition
The immunopanning-isolated cell RNAseq data covering 5 cell types: neuron, astrocyte, endothelial cell, microglia and oligodendrocyte in normal temporal lobe cortex was used as reference (Zhang et al., 2016) to infer the cell-type composition of GTEx hippocampal samples. Based on a recent work of deconvolution of GTEx brain samples across multiple regions (Patrick et al., 2020), we used the DSA (Digital Sorting Algorithm) (Zhong et al., 2013) for celltype proportion estimation. We followed the recommended processing procedures (TMM normalization, top 100 markers) and applied DSA to HIPP and BA24_AC gene expression data (adjusted age, sex, PMI, RIN, batch, and 3 genotype PCs) to compare the cell-type proportions among GTEx aging subgroups and between brain regions.

Comparison of Parahippocampal Gyrus and Genotype-Tissue Expression Hippocampal Transcriptomes
We obtained gene expression data from 215 parahippocampal gyrus (PHG) samples which were profiled at Mount Sinai . To compare the PHG and GTEx hippocampal gene expression data, we first calculated log2(tmp + 1) for both datasets, merged these datasets and then removed the batch effects using R ComBat package with age, PMI, sex, RIN as covariates. Negative values were assigned to 0 followed by sample-wide quantile normalization. We selected the PHG samples with age > 70 and obtained 78 samples [19 normal (CDR = 0, braak score: bbscore ≤ 3, CERAD = "NL"), 59 LOAD (CDR ≥ 1, bbscore ≥ 5, CERAD = "definite AD")]. Based on the GTEx top 5000 variance genes, we clustered PHG samples and found the normal and AD samples were mixed to some degree. After removing the mixed samples, 51 samples (14 normal and 37 LOAD samples) showed clear separation into two groups corresponding to donors' AD status. We also performed DE analysis in LOAD vs. normal in filtered version (PHG 51 samples) and mixed version (PHG 78 samples) and compared the DEGs.

Functional Enrichment Analysis
We annotated the biological functions of each gene list using DVAID tool (Huang et al., 2009;Sherman and Lempicki, 2009).
We also performed pathway analysis using MetaCore integrated software suite 1 (last accessed in November 2019) to determine enriched biological processes. Signal transduction gene regulation network analysis was based on SIGNOR 2.0 (Licata et al., 2020) from the SIGnaling Network Open Resource (downloaded on Aug. 2020) using networkanalyst web tool (Zhou et al., 2019).

Derive Brain Aging and Alzheimer's Disease Signatures From Multiple Transcriptomic Datasets
We collected multiple large-scale brain aging and AD gene expression datasets (Figure 1) which are summarized in Table 1 and Supplementary Table 1. It is of note that to obtain "normal" aging gene expression signatures, samples from donors annotated with any brain-related diseases were removed in the GTEx data.
We used a linear regression model to identify gene expression associated with donors' chronological age and consider these genes as brain aging genes (we call each list of brain aging genes as an aging signature for the corresponding brain region). We derived aging signatures in 11 out of 13 brain regions from the GTEx data; two regions, i.e., the brain spinal cord and 1 https://portal.genego.com/ substantia nigra showed no apparent age-associated genes (0 and 1 gene, respectively) and were not considered for further analysis. Similarly, we derived aging signatures from 8 out of the 10 brain regions profiled in UK data; substantia nigra and medulla tissues showed no apparent aging signatures and were removed from further analysis. Genes differentially expressed between AD and normal control samples were extracted from previously published work (similarly we call them as AD signatures). These AD signatures are MS AD sets from Mount Sinai Medical Center Brain Bank (MSBB) AD cohort (Wang et al., 2016), and "Other" AD sets included AMP-AD knowledge portal data (Hodes and Buckholtz, 2016), Jager gene lists from the dorsal lateral prefrontal cortex (DLPFC) region (Mostafavi et al., 2018), Annese2018 DEG list from hippocampal (HIPP) CA1 (denoted as Annese2018 HIPP) (Annese et al., 2018) and Rooij2019 DEG list in HIPP (van Rooij et al., 2019) (denoted as Rooij2019 HIPP) (see Table 1 and section "Materials and Methods).

A Global Comparison of Aging and Alzheimer's Disease Signatures Across Brain Regions
We first identified genes that show similar gene expression regulation across multiple brain regions in aging and AD datasets, respectively (we call them global aging and AD signatures). We found that 91 aging genes are consistently up or downregulated in more than 4 out of 19 aging gene lists (11 GTEx and 8 UK brain regions). Among them, 27 are up-regulated and 64 are down-regulated (see Supplementary Table 2) with age. Similarly, 86 AD signature genes show consistent regulation in at least 9 out of 32 AD gene signature lists, among which 51 are FIGURE 1 | Flowchart of the comparison between normal brain aging and Alzheimer's disease (AD). We collect gene expression profiles from a large number of normal aging brain and AD brain samples across multiple brain regions. We perform both a global comparison (Q1) and region-specific comparison (Q2) of aging and AD transcriptomes. We also consider the subgrouping in aging brain samples and compare the aging subgroups with AD (Q3).
Frontiers in Aging Neuroscience | www.frontiersin.org consistently up-regulated and 35 are consistently down-regulated in AD brains (see Supplementary Table 2). We then perform function annotation of these global aging and AD signatures.

The Global Aging and Alzheimer's Disease Signatures Show Up-Regulation of Immune Complement Pathway Genes and Down-Regulation of Synaptic Related Signaling Genes
We used the DAVID tool to annotate the functional enrichment and found the 91 aging genes are enriched for synapse, nucleotide-binding and ion genes (see Supplementary Table 3.1); while the 86 AD signature genes are enriched for synapse, phosphoprotein and transport genes (see Supplementary Table 3.4). We further used MetaCore to annotate the function of consistently up-or down-regulated aging and AD signature genes. The 27 up-regulated aging genes are enriched for regulation of lipid metabolism, insulin regulation of glycogen metabolism (PHK gamma gene; FDR = 2.9E-2), immune/inflammation complement pathway (C4B, C4, C4A; FDR = 1.9E-06), response to metal ion/transport categories (e.g., cellular response to copper ion; FDR = 1.7E-06), and dopamine metabolic process (MAOB, MAO; FDR = 3.8E-3) (see Supplementary Table 3.2); the 64 down-regulated aging genes are enriched for MAPK (FDR = 1.2E-02)/ASK1 (FDR = 2.4E-02) pathway, synaptic vesicle/calcium ion exocytosis categories, synaptic related signaling (NMDA, glutamate) (see Supplementary Table 3.3). The 51 up-regulated AD genes are involved in immune complement pathway (GPCRs), TGF-beta receptor signaling (FDR = 4.6E-02), cytoskeleton remodeling, and L-glutamate import/neurotransmitter transport (see Supplementary Table 3.5); 35 downregulated AD genes are involved in synapse and GABAergic neurotransmission, glutamate secretion/glutamatergic pathways, vesicle fusion/recycling and transport, cytoskeleton remodeling, cell adhesion and protein phosphorylation (see Supplementary Table 3.6). Therefore, both AD and aging global signature genes show up-regulation in immune complement pathway and down-regulation in synapse (especially glutamate, synaptic vesicle recycling) related pathways; while aging genes are more involved in metal ion, transport, glycogen metabolism and AD signature genes are more involved in cytoskeleton remodeling, cell adhesion and synaptic categories.

SST and SVOP Are Down-Regulated and FOXJ1, SLC44A1 Are Up-regulated in the Global Aging and Alzheimer's Disease Signatures
Two genes, SST and SVOP are consistently down-regulated in both global aging and AD gene signatures, while FOXJ1 and SLC44A1 are consistently up-regulated.
SST (somatostatin) is a neuropeptide hormone that maintains permeability and integrity of the blood-brain barrier (BBB) by regulating LRP1 and RAGE expression. It abrogates Aβ-induced JNK phosphorylation and expression of MMP2 to maintain permeability and integrity of BBB (Paik et al., 2019). Reduction of SST levels in the CSF and brain tissue is associated with impaired cognitive function and memory loss (Solarski et al., 2018). SVOP (synaptic vesicle 2-related protein) can bind to adenine nucleotide (particularly NAD) via its C-terminal extremity (Yao and Bajjalieh, 2009). Although the functions of SVOP remain obscure, the evolutionary conservation and homology to transporters support it may play a role in molecular trafficking in synaptic vesicles (Janz et al., 1998).
In mammalian cells, FOXJ1 is a member of the Forkhead/winged helix (FOX) family of transcription factors that is involved in ciliogenesis (Yu et al., 2008). It is shown to suppress NFκB, a key regulator in the immune response (Lin et al., 2004). It is hypothesized that FOXJ1 may play a protective role involved in the pathophysiology of brain injury and may be required for the differentiation of the cells (Jacquet et al., 2009). The solute carrier 44A1 (SLC44A1) is a plasma membrane choline transporter. It is also a mitochondrial protein and acts as a choline transmembrane transporter. Choline is essential for the synthesis of the neurotransmitter ACh by cholinergic neurons for regulating neuronal activity. Choline deprivation in the central nervous system reduces acetylcholine (ACh) release, memory retention, and spatial cognition in the hippocampus (Canty and Zeisel, 1994;Nakamura et al., 2001;Zeisel, 2007).

Brain Region-Specific Comparison of Aging and Alzheimer's Disease Signatures
In addition to comparing the global aging and AD signatures, we also compared aging and AD signatures in a brain regionspecific manner.

Hypothalamus, Hippocampus, and Certain Cortical Regions Show Stronger Age-Related Gene Expression Changes Compared to Other Brain Regions
As can be seen in Supplementary Figure 1, the number of aging genes varies across brain regions. The regions showing a large number of aging genes in the GTEx data are hypothalamus (HYPO: 490 UP and 1160 DN genes), hippocampus (HIPP: 705 UP and 698 DN genes), and brain anterior cingulate cortex BA24 (BA24_AC: 320 UP and 888 DN genes). For the UK data, the brain regions showing a large number of aging genes are hippocampus (HIPP:432 UP and 324 DN genes) and temporal cortex (TCTX: 91 UP and 419 DN genes). It is of note that GTEx and UK do not cover the same brain regions, e.g., the hypothalamus was profiled by GTEx but not in the UK brain data. Among the 4 regions profiled by both GTEx and UK, (i.e., HIPP, CRBL, PUTM, and FCTX), HIPP and CRBL show a higher number of aging genes and stronger overlap between GTEx and UK aging signatures than the other two brain regions. Interestingly, the hippocampal aging signature significantly overlaps with aging signatures from several cortical regions, while the cerebellum aging signature has very little overlap with other regions. The cerebellum show the most distinguishable gene expression patterns from all other brain regions which was also reported by the Allen Brain Atlases study (Mahfouz et al., 2015).
Since a larger sample size provides greater statistical power and generally allows more age-associated genes to be identified, a fair comparison across brain regions requires the sample size to be identical or very close to each other. The GTEx HIPP (N = 57), BA24_AC (N = 56), and HYPO (N = 53) have relatively small sample sizes compared to other brain regions (see Table 1), so the larger number of age-associated genes in these brain regions were unlikely caused by their sample sizes. However, for the UK brain data, the hippocampal region had relatively large sample size (N = 93) compared to other brain regions, such as the TCTX (N = 82), the brain region with the smallest sample size. To ensure the large number of age-associated genes identified in UK HIPP was not due to its sample size, we performed a down-sampling test in three brain regions to compare their age-associated genes when they have the same number of samples. As shown in Supplementary Table 4, after down-sampling 565.76 ± 483.63 (mean ± standard deviation) age-associated genes could be identified in the UK HIPP. Although much fewer genes were found compared to the 959 age-associated genes identified from the 93 HIPP samples, the HIPP remains to show the largest number of aging genes among all the UK brain regions.
In summary, brain aging signatures are highly region-specific. Hippocampus, hypothalamus, and cortex TCTX and BA24_AC are more affected by aging on the transcriptome level than other brain regions surveyed by the GTEx and UK brain data even when the sample size difference is considered.

Common Hippocampal Aging Genes Between Genotype-Tissue Expression and UK Data
In general, the aging signatures from GTEx and UK show large reproducibility in matched brain regions. Using the hippocampus as an example, the 705 up-regulated GTEx aging genes significantly overlap with 431 up-regulated UK aging genes by 77 genes (FDR = 2.6E-31); and the 698 down-regulated GTEx aging genes overlap with 324 down-regulated UK aging genes by 42 genes (FDR = 3.6E-12). The 119 commonly up-or downregulated hippocampal aging genes between GTEx and UK are enriched for phosphoprotein, alternative splicing, acetylation, complement, and synapse (see Supplementary Table 5.1). In addition to the immune/inflammation categories found in previous global aging signatures, the 77 commonly up-regulated aging genes are also enriched for TGF-β signal, transcription regulation [such as mRNA transcription by RNA polymerase II with FDR of 2.2E-03, ATP-dependent chromatin remodeling with FDR of 7.3E-03, blood vessel remodeling, membrane protein intracellular domain proteolysis, protein transport, and metabolic process (see Supplementary Table 5.2)]. Furthermore, from signal transduction network analysis (see Supplementary Tables 5.4, 5.5), the top signal network contains 67 up-regulated genes and is connected with many metabolicrelated pathways such as endocrine resistance (FDR = 4.0E-15), foxo signaling pathway (FDR = 1.4E-13), thyroid hormone signaling pathway (FDR = 8.7E-12), sphingolipid signaling pathway (FDR = 1.8E-10), neurotrophin signaling pathway (FDR = 1.8E-10), autophagy -animal (FDR = 3.6E-10), and insulin resistance (FDR = 1.5E-09).
The 42 commonly down-regulated aging genes between GTEx and UK are related to insulin-like growth factor 1 (IGF-1) signaling, ERK1/2 pathway (for caveolin-mediated endocytosis or signal transduction), methylation, cell adhesion (e.g., synaptic contact), and long-term potentiation pathways and previous reported energy-dependent pathways such as synaptic categories (e.g., synapse vesicle), oxidative categories (see Supplementary Table 5.3) based on MetaCore analysis. IGF-1 (insulin-like growth factor) is a growth factor and neurohormone with some evidence suggesting its involvement in neurocognitive functions, neuroinflammation, and amyloidβ clearance. Furthermore, from signal transduction network analysis on the KEGG database (see Supplementary Tables 5.6, 5.7), the top signal network is involved in circadian entrainment (FDR = 9.2E-07) and multiple synaptic-related pathways.
Brain Aging Signatures From the Hypothalamus, the Hippocampus, and BA24 Anterior Cingulate Cortex Significantly Overlap With Alzheimer's Disease Signatures After we evaluated the aging signatures between GTEx and UK data, we then compared the aging signatures with AD signatures in a brain-region specific manner. We further collected two hippocampal DEG lists (Rooij2019 and Annese2018, listed in "Other" AD sets) for the following analysis. As can be seen in Figure 2, strong overlap between aging and AD signatures are observed in hippocampus, hypothalamus, and several cortical regions. For example, the GTEx hippocampus aging signature strongly overlaps with AD signatures derived from either hippocampus or several cortex regions. For the 698 GTEx_HIPP_DN aging genes, they overlap with Rooj2019_HIPP_DN by 288 genes (adjusted P-value = 2.78E-130); and the 705 GTEx_HIPP_UP genes overlap with Mayo_TCTX _UP (1,578) by 150 genes (adjusted P-value = 6.62E-28). It is of note that as we divide aging and AD signatures into up-and down-regulated genes, in most cases, the down-regulated aging genes only significantly overlap with down-regulated AD genes, and vice versa, which supports that these signatures represent real biological signals. The overlap pattern is highly brain region specific. For example, GTEx_HIPP_UP (705) significantly overlap with AD signatures from hippocampus, while it only overlaps with AD signatures from non-hippocampal regions such as MS_BM44_IFG_UP (513) by 21 genes (adjusted P-value = 0.72). In both GTEx and UK data, the aging signature in cerebellum shows relatively weak overlap with our AD signatures.
We also observed that the GTEx hippocampus and UK hippocampus aging signatures showed similar overlap pattern with AD signatures across different brain regions. Similarly, different AD signatures such as Annesse2018 and Rooj2019 showed similar overlap pattern with aging signatures across different brain regions, which further suggests that the aging and AD signatures from different studies are biologically meaningful and contain real signals of aging and AD.

Functional Enrichment of Aging Specific, Aging/Alzheimer's Disease Common and Alzheimer's Disease Specific Genes in Hippocampus
Since aging and AD gene signatures show strong overlap in the hippocampus as seen in Figure 2, we focused on this brain region to investigate the functional enrichment of aging FIGURE 2 | Overlap between aging and AD signatures in different brain regions. AD signatures are plotted in rows and aging signatures are plotted in columns. We separate each signature into up-and down-regulated genes and the number of genes in each signature is listed after its ID. The number in the heatmap indicates how many genes are common in the corresponding aging and AD signatures while the color indicates the significance of the overlap. and AD genes. To achieve better robustness, we annotated hippocampus aging genes derived from both GTEx and UK with hippocampus AD genes derived from both Annese2018 and Rooij2019 (Figure 3). We divided genes in the aging and AD signatures into three categories: aging specific signature genes (denoted as "ASGs"), aging-AD common signature genes (denoted as "AADGs"), and AD specific signature genes (denoted as "ADSGs"). Phosphoprotein and alternative splicing are enriched in ASGs, AADGs and ADSGs (also in both upand down-regulated genes); while glycosylation/glycoprotein, immune/inflammatory, stress response, mitochondria are more enriched in ADSGs; acetylation, nucleus, and Ubl conjugation are mostly enriched in ASGs (see Supplementary Table 6.1). Down-regulated process of ASGs, AADGs, and ADSGs are all associated with membrane/cytoskeleton categories. Downregulated AADGs are enriched for synapses (including cell junction, cholinergic/GABAergic/dopaminergic/glutamatergic synapse, AMPA etc.) and ion/transport (Figure 3 and Supplementary Table 6); The comparison suggests that different from the normal aging process, the AD-specific and aging/AD commonly affected genes are more enriched for glycoprotein, inflammatory response, synapse and mitochondrial dysfunction, while the aging process are more related to the nucleus, acetylation, coiled coil, which are less directly related to neurodegenerative phenotype. The functional annotation of AADGs suggests that dysregulation in transcription regulation, energy metabolism, membrane remodeling, extracellular vesicles (EV) and synapse pathways have already initiated and developed to some degrees in normal brain aging even though these individuals remain cognitively normal, while other biological processes such as inflammation/immune response are further escalated in AD patients.

Subgroups Can Be Found in Both Genotype-Tissue Expression and UK Hippocampus Datasets
Although we have shown that aging and AD share multiple gene expression changes in the hippocampus, it is well known We consider the functional enrichment of aging specific genes (ASGs), conserved AD specific genes (ADSGs) between two AD signatures, and the overlap between aging and AD genes (AADGs) which represents genes shared by at least one aging list and one AD list. We list the most representative function categories with FDR < 0.05. To reduce redundancy, only one representative functional category from each identified cluster of functions was selected.
that not every old individual will develop AD. This suggests the interconnection between normal aging and AD may occur at different intensity within the aging population. To examine the potential heterogeneity of normal brain aging, we explored subgrouping in both GTEx and UK hippocampus samples; and we observed that both GTEx and UK samples can be divided into major subgroups. As shown in Figure 4, 56 GTEx hippocampal samples with donors' age between 45 and 70 can form three major clusters (named subgroups A, B, and C). We noticed that samples from different age groups are relatively evenly distributed across these subgroups, suggesting that subgrouping is not due to difference in donor ages. Similar subgrouping is also observed in UK hippocampal data ( Figure 4B).

Differential Expression Genes Derived From Comparing Aging Brain Subgroups Highly Overlap With Alzheimer's Disease Signatures
We performed pair-wise comparisons to derive DEGs between subgroups in both GTEx and UK datasets, respectively. We jointly considered these DEGs (FDR < 0.01) with 3 hippocampal AD signatures (MS, Rooij2019, and Annese2018). As shown in Figure 5, DEGs derived from comparing GTEx subgroups B vs. A (denoted as GTEx DEG BvsA) show strong overlap with AD signatures from Rooij2019 and Annese2018. For the GTEx_DEG_BvsA down-regulated genes (3,018 genes), they overlap with Rooij2019 down-regulated genes (1,622 genes) by 1,274 genes while they only overlap with Rooij2019 up-regulated genes (1,045 genes) by 7 genes. This is highly significant, as Rooij2019 down-regulated genes (1,622 genes) overlap with another AD signature, Annese2018 down-regulated genes (1,183 genes) by 614 genes, which is comparable to the overlap we see with aging subgroup DEGs. It is of note that GTEx samples used in the subgroup analysis are from donors 45-70 years old without AD or other types of brain diseases. Simply by performing subgrouping, the DEGs from comparing these subgroups highly overlap with AD signatures from independent studies, which further supports that gene expression changes in a subpopulation of the cognitively normal brains have very strong interconnection with AD. Similarly, GTEx DEGs CvsA also strongly overlap with AD signatures. Based on the pattern of overlap, we infer that GTEx subgroup A is more likely to be a "healthy" aging subgroup while GTEx subgroups B and C are more similar to AD (here denoted as "AD similar group"). Since GTEx DEG CvsB showed reverse overlap with AD signatures from Annese2018 and Rooij2019, this indicates that subgroup B is more similarity with AD samples compared to subgroup C. Therefore, the order of subgroups judged by how similar they are with AD samples can be inferred as AD samples > subgroup B > subgroup C > subgroup A, which is also consistent with the observation that GTEx DEG CvsA overlap with GTEx DEG BvsA in the same gene regulation direction. Similarly, for the UK data, we also observed very strong overlap between DEGs from comparing various subgroups and AD signatures. For example, the UK_DEG_AvsC_DN (3,144 genes) highly overlap with Rooij2019_DN signature (1,622 genes) by 762 genes. Similarly, we inferred the order of UK subgroups judged by how similar

Function Annotation of Genotype-Tissue Expression and UK Subgroup Differential Expression Genes With Alzheimer's Disease Signatures
We next used the DAVID tool to annotate the functional enrichment for GTEx and UK subgroup DEGs (Figure 6 and Supplementary Table 7). We focused on the DEGs (FDR < 0.01) derived from comparing GTEx subgroup B (a subgroup that is more similar to AD) with the relatively healthy aging subgroup A (named GTEx DEGs BvsA) and similarly for UK DEGs AvsC with two AD signatures (Annese2018 and Rooij2019). Similar to previous annotations, we denote the aging subgroup DEGs as ABGs and annotate the functional enrichment of ABSGs (aging subgroup DEG specific genes), ADSGs, and ABADGs (common genes between aging subgroup DEGs and AD signature genes). As shown in Supplementary Table 7.1 and Figure 6, certain post translational modifications (PTMs) (e.g., phosphoprotein, alternative splicing, Ubl conjugation, The overlap of ABSGs between GTEx and UK is denoted as "Conserved ABSGs". The union of Rooij2019 and Annese2018 subtracting any ABGs is denoted as "ADSGs". We list the most representative function categories with FDR < 0.05. To reduce redundancy, only one representative functional category from each identified cluster of functions was selected. and acetylation), membrane and cytoskeleton categories are significantly enriched in ABSGs, ADSGs, ABADGs and also enriched in both up-and down-regulated genes, respectively. ADSGs (both up-and down-regulation) are enriched for glycoprotein, while up-regulated ADSGs are more enriched for immunity, inflammatory response categories and downregulated ADSGs are enriched for calcium/ion categories. The up-regulated conserved ABSGs show enrichment in cell cycle category and down-regulated conserved ABSGs show enrichment in proteostasis such as proteasome, protein transport and protein binding. For ABADGs, up-regulated genes are mainly enriched for transcription regulation and metabolism categories (e.g., PI3K-Akt signaling pathway, MAPK signaling pathway, insulin resistance), while down-regulated genes are mainly enriched for synapse, lipoprotein, and circadian entrainment categories.
In summary, aging subgroup DEG functional analysis further suggests that remodeling of PTMs (e.g., phosphoprotein and alternative splicing), proteostasis, cytoskeleton, and metabolism are likely initiated in normal aging subgroups that show highly similar gene expression changes with AD patients.

Differentially Expressed Genes Between Subgroups Could Be Partially Driven by Difference in the Cellular Compositions
The differentially expressed genes we observed between subgroups could be caused by either the dysregulation of gene expressions or the change in the cellular compositions or both. Since both GTEx and UK were bulk tissue gene expression data, the cellular composition information was not available. To evaluate if the cellular compositions were different between subgroups, we used computational tools to perform cell deconvolution. Based on a recent cell deconvolution work which provided experimental data validation (Patrick et al., 2020), we used the DSA (Digital Sorting Algorithm) method (Zhong et al., 2013) and cell-type reference data (Zhang et al., 2016) for cell-type proportion estimation (see section "Materials and Methods"). We were able to reproduce Patrick et al. (2020)'s result in the GTEx hippocampus when we made no adjustment to GTEx gene expression data ( Supplementary Figure 2A). When we adjusted the GTEx data by covariates like age, sex, PMI, we did notice some difference in the estimated proportion of several cell-types compared to Patrick et al. (2020)'s results (Supplementary Figure 2B). When we applied this approach to GTEx HIPP gene expression data, as can be seen in Figure 7, neurons, oligodendrocytes and microglia showed significant difference among GTEx aging subgroups. Interestingly, the GTEx subgroup A (the healthy aging subgroup) showed higher proportions of neurons than AD similar subgroups; GTEx subgroup C (an AD similar group) showed to have elevated proportion of microglial cells; while GTEx subgroup B (another AD similar subgroup) showed to have elevated number of oligodendrocytes. In addition, the cell-type proportion of GTEx subgroup A was highly similar FIGURE 7 | Subgroup comparison of estimated 5 cell-type proportions of GTEx HIPP and PHG data using DSA method and Zhang's reference data. 5 cell type proportion in GTEx subgroup (36 A, 9 B, and 11 C) and PHG normal (14 NL), Mixed normal: normal samples that clustered with AD samples (5 M_NL), LOAD (37 AD) and Mix LOAD: AD samples that clustered with normal control samples (22 M_AD). Kruskal-Wallis rank sum test and Wilcox test rank sum test were used to calculate the significance levels between the groups.
to PHG normal control group, while subgroups B and C were more similar to PHG AD group. Not surprisingly, the control samples clustered with AD showed similar estimated cell-type proportion as AD samples, while the AD samples clustered with control showed similar estimated cell-type proportion as control group. Similarly, as shown in Supplementary  Figure 3, almost all the five cell-types' proportions were different between hippocampus and BA24 (the two brain regions we chose to compare), suggesting cell-proportion difference widely existed across brain aging subtypes and brain regions.

Genotype-Tissue Expression "Healthy Aging" Subgroup Will Likely Remain Cognitively Normal in Older Ages as Implied From Joint Analysis With Another Brain Transcriptomic Dataset
In both GTEx and UK datasets, we observed a subgroup that is quite different from AD and subgroups that are more similar to AD on the transcriptome level. We hypothesize that the subgroup least similar to AD is a "healthy aging" subgroup and donors in this subgroup will have a better chance of being cognitively normal if they could live into 70s or older. In contrast, the "AD similar" subgroups will likely have a higher chance of developing cognitive deficits if these individuals could live into 70s or older. To test this hypothesis, a longitudinal study that follows-up these subgroup individuals for several decades will be required. This is not feasible mainly because brain tissues are mostly available from postmortem donors which do not permit longitudinal studies. To evaluate the hypothesis, we have to rely on alternative approaches.
To indirectly test the hypothesis, we relied on another large brain transcriptomic dataset which profiled more than 200 brain tissues from the parahippocampal gyrus (PHG) region . Although PHG is a different brain region, it is next to the hippocampus and our analysis showed that the two transcriptomic datasets are comparable. From these PHG samples, we selected two groups of samples, i.e., control and AD samples. The normal control samples were chosen from donors with age > 70, CDR = 0, braak score (bbscore) ≤ 3, CERAD = 'NL' which has 19 samples. For this group, the donor ages ranged from 73 to 103, with mean and standard deviation of 83.3 ± 8.7 years. The second group is the AD group, which we required donors to be >70 years, CDR ≥ 1, bbscore ≥ 5, CERAD = 'definite AD.' 59 samples met these criteria, and the donor ages ranged from 71 to 104, with mean and standard deviation of 86.966 ± 8.4 years. We merged these PHG data with the GTEx hippocampus RNA-seq data and corrected the batch effect using the R ComBat package (see section "Materials and Method") to form a unified dataset. The joint processing introduced some changes to the GTEx gene expression. Therefore, we performed the hierarchical clustering on the 56 GTEx samples again and observed that the new subgrouping structure (subgroup B [n = 11] > C [n = 10] > A [n = 35]) is highly similar to the previous clustering results (subgroup B [n = 9] > C' [n = 11] > A' [n = 36]). For example, the new subgroup A (35 samples) overlaps with the previous subgroup A' (36 samples) by 32 samples. We performed hierarchical clustering on the 78 PHG samples and found normal control and AD samples were partially mixed together (see Supplementary Figure 4A). The partial mix of control and AD samples is not unexpected. For the control samples, they were from cognitively normal donors. Just like the GTEx and UK brain data, we have repeatedly observed that a subgroup of cognitively normal individuals showed gene expression changes highly similar to AD. On the other hand, there were "AD" samples mixed with the control samples, which is possibly due to the subtypes of AD samples (Neff et al., 2021) as in some AD subtypes, the PHG region could be relatively normal.
Since we want to obtain samples that truly represent normal control and AD based on their gene expression, including the mixed samples will likely dilute the contrast between healthy and AD and blur the biological signals between the two. Based on this rationale, we removed the mixed samples and obtained 14 normal (ages 73 ∼ 103, 5 males and 9 females) and 37 AD samples (ages 73 ∼ 104, 11 males and 26 females) which formed two fully separated groups based on their gene expression (see Supplementary Figure 4B). We then assigned these PHG samples to the GTEx subgroups by adding one PHG sample at a time to the 56 GTEx samples for hierarchical clustering. We found that all the 14 control samples clustered with the GTEx "healthy aging" subgroup A. For the 37 PHG AD samples, 5 (3 APOE ε3/ε4 and 2 APOE ε3/ε3) clustered with GTEx subgroup B and 26 clustered with GTEx subgroup C (the two "AD similar" subgroups), only 6 samples (all APOE ε3/ε3) clustered with the "healthy aging" subgroup A. This result suggests that the gene expression of the "healthy aging" subgroup is mainly associated with cognitively normal status in older individuals while the gene expression of the "AD similar" subgroup is largely associated with AD in this older cohort. To ensure the result is robust, we also compared the results when we kept the mixed sample and the results are summarized in Supplementary Table 8. Without removing the mixed samples, for the 78 PHG samples (age > 70), 5 control samples were mixed with AD while 22 AD samples were mixed with controls (a total of 27 mixed samples). We assigned these PHG samples to the GTEx subgroups by adding one sample at a time using hierarchical clustering. 4 control samples clustered with GTEx group C and the other control sample clustered with GTEx group A. 7 AD samples clustered with GTEx group C, 1 clustered with group B (the first AD similar group), and 14 AD samples clustered with GTEx group A. This is largely consistent with our expectations, that the control samples mixed with AD clustered with an AD similar group, while majority of the AD samples that mixed with control samples (14/22) clustered with the healthy aging subgroup (GTEx subgroup A). Considering all the 78 PHG samples, our observation remains largely the same, i.e., most of the control samples (14 out of 19) clustered with the GTEx healthy aging subgroup, while majority of the AD samples (37 out of 59) clustered with GTEx AD similar subgroups. In addition, as shown in Supplementary Figure 5, the filtered version (PHG 51 samples) captured much stronger gene expression differences between AD and control compared to the DEGs obtained from the mixed samples (PHG 78 samples).

DISCUSSION
To better understand the interconnection between aging and LOAD, we collected gene expression profiles from several largescale transcriptomic datasets covering multiple brain regions and systematically compared aging and LOAD gene expression signatures. Different brain regions showed varied levels of gene expression changes in aging and AD, respectively. Among all the brain regions we studied, hippocampus is one of the top regions that show a very strong interconnection between aging and AD on the transcriptomic level. We observed common functional enrichment in aging and LOAD related to PTMs (especially for alternative splicing and phosphoprotein), neurotransmission (especially glutamate), membrane, cytoskeleton, and lipid metabolism. We also showed that gene expression changes in aging brain has more association with acetylation while AD-specific gene expression changes are more related to inflammatory response, glycoprotein, mitochondria and synapse. Importantly, we demonstrated that cognitively normal brains are not homogeneous in their transcriptomes and several major subgroups could be identified. By comparing gene expression among different subgroups, we showed that gene expression changes in a subpopulation have strong overlap with the AD signature, suggesting that although aging is a general risk factor for AD, only a subset of individuals may experience gene expression changes to a level that is significantly associated with the onset of the disease.
From our study, several biological processes shared between LOAD and normal aging could point to the mechanisms of how aging and AD interconnect. First, PTMs related genes such as alternative splicing and phosphoprotein are enriched in both up-and down-regulated ASGs, AADGs/ABADGs, and ADSGs (e.g., top function terms in Figures 3, 6). PTMs regulation plays critical role for synaptic plasticity at several levels (Kiltschewskij and Cairns, 2017) by increasing proteome diversity through alternative splicing, or by enabling activitydependent regulation of mRNA localization, translation or degradation in the dendrite. It will be interesting to investigate what events trigger the PTMs that are observed in aging and AD. Second, the immune response strongly showed-up in AD signatures as can be seen in our results and previous studies (Verbitsky et al., 2004;Reichwald et al., 2009;Bordner et al., 2011;Cribbs et al., 2012). Since it was less significantly up-regulated in either ASGs/AADGs or ABSGs/ABADGs, this supports that the escalated inflammation is a hallmark of AD, relative to normal brain aging which is also known to be associated with low-grade neuroinflammation. Whether the inflammatory response plays a key causative role in the early stage of AD development or it is more of a reactive response to the upstream pathology requires further investigation.
Since it is infeasible to directly test if individuals in the "healthy aging" subgroup A will remain cognitively normal decades later, we relied on an alternative strategy in which we compared transcriptomes between GTEx brain aging subgroups with a cohort of older individuals who were either AD or cognitively normal. For the 51 PHG samples (14 control and 37 AD samples), we observed that 31 out of 37 AD samples clustered with GTEx subgroups B and C (AD similar subgroups); while all the 14 normal control samples clustered with GTEx subgroup A, which presumably represents a healthy brain aging group. This result suggests that the gene expression in subgroup A is strongly associated with normal cognitive functions in both relatively younger and older individuals, which supports that GTEx subgroup A is a healthy aging subgroup and individuals in this subgroup will likely remain cognitively normal when they get older. However, we did observe that several AD patients have their PHG transcriptome clustered with GTEx subgroup A (6 out of 37 AD samples). This indicates that an individual with a healthy aging gene expression pattern in the hippocampus or PHG can still have AD. We think this could be explained by the heterogeneity of AD which is increasingly known to have multiple subtypes (Neff et al., 2021). For example, at least four subtypes of AD (typical, limbic-predominant, hippocampalsparing, and minimal atrophy AD) have been reported based on distribution of tau related pathology and regional brain atrophy (Ferreira et al., 2020). It is possible that some subtypes of AD could have their hippocampal transcriptomes similar to our healthy aging subgroup. For example, it has been reported that hippocampal-sparing AD subtype has a lower frequency of APOE ε4 compared with typical and limbicpredominant AD (Ferreira et al., 2020); interestingly, all the 6 AD samples clustered with the GTEx subgroup A are from donors of APOE ε3/ε3. Since AD is not a single brainregion disease, to accurately predict AD development based on gene expression or any type of brain region-specific data, we believe that multiple regions should be examined and studied together.
Although the PHG data suggest that different aging subgroups may have distinct probabilities of developing AD decades later, the "causal link" between certain aging subtypes and AD should not be assumed. Other alternative mechanisms should be considered which may explain our observations. For example, the aging subtypes could correspond to the natural fluctuation of brain states, while LOAD may represent a rather different state (or several states) that is difficult to escape once entered. To fully elucidate the underlying mechanisms of aging subtypes and their link to AD development, much more studies are needed.
The results from cell decomposition of the bulk gene expression data suggest that the differentially expressed gene expression between subtypes is at least partially due to the changes in cell proportions across the subgroups. For example, the down-regulation of synapse genes in the GTEx aging subgroup B might be explained by the decrease in the number of neurons in the samples in this subgroup. However, the deconvolution method we used assumed that the reference gene expressions do not change across conditions (in our case, the aging subgroups), which may not necessarily be true. Therefore, although it is very likely that cell compositions changed across subtypes, it is also possible that some gene expressions changed in a cell-type specific manner across the aging subgroups. To fully resolve this issue, single-cell profiling of samples from these aging subgroups will be highly useful.
The cell deconvolution analysis suggested cell-type proportion changes in certain aging subgroups. The loss of neurons and repression of their gene expression in AD brains have been reported (Zarow et al., 2005;Mathys et al., 2019), similarly the activation of microglia in AD is also well-recognized (Hopperton et al., 2018). Recently single-nucleus RNA-seq data from AD and control brains also suggest the increase of certain subpopulations of oligodendrocytes in AD brains (Mathys et al., 2019;Lau et al., 2020). Although the technical variation in sample dissection may contribute to the cell-type proportion difference across samples, the cell-proportions estimated in GTEx subgroup A and PHG control samples are highly comparable (see Figure 7), suggesting the possible sampling variation within the same brain region is unlikely a major factor for causing the cell-type proportion variation. In addition, we estimated cell-type proportions explained ∼36% total gene expression variation in GTEx HIPP samples and ∼28% for PHG samples, implying other factors such as cell-type specific gene expression changes occurred in these samples. Again, single-cell gene expression data from aging subgroups and comparison with AD single-cell data will be needed to further understand the differential gene expressions among aging subgroups at a celltype specific level.
Finally, we observed no significant difference in sex distribution among GTEx subgroups (Fisher exact test, p-value = 0.81) and UK subgroups (p-value = 0.37). For example, for the 36 samples in GTEx subgroup A, 27 were males, and 9 were females; for the 9 samples in GTEx subgroup B, 8 were males and 1 was female; and for GTEx subgroup C, 8 were males and 3 were females. Since there were only very limited number of female samples in the AD similar subgroups, we did not have sufficient statistical power to investigate the sex-related difference in these aging subgroups.
In summary, our study suggests that combined analysis of aging and LOAD can help us to understand how aging may contribute to the development of LOAD. Since most genomic studies on AD relied on AD samples from donors in 70s to 90s, these samples may not provide the information for the molecular events occurred in the very early stage of the disease development. As we demonstrated in this work, the brains from a subpopulation of cognitively normal individuals in their 40-70s already show gene expression changes similar to AD, this supports that the initiation of LOAD could occur decades earlier than the manifestation of clinical phenotypes and it may be critical to closely study cognitively normal