ORIGINAL RESEARCH article

Front. Aging Neurosci., 18 March 2022

Sec. Alzheimer's Disease and Related Dementias

Volume 14 - 2022 | https://doi.org/10.3389/fnagi.2022.752858

Identification of Potential Driver Genes and Pathways Based on Transcriptomics Data in Alzheimer's Disease

  • School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China

Article metrics

View details

8

Citations

3,7k

Views

1,3k

Downloads

Abstract

Alzheimer's disease (AD) is one of the most common neurodegenerative diseases. To identify AD-related genes from transcriptomics and help to develop new drugs to treat AD. In this study, firstly, we obtained differentially expressed genes (DEG)-enriched coexpression networks between AD and normal samples in multiple transcriptomics datasets by weighted gene co-expression network analysis (WGCNA). Then, a convergent genomic approach (CFG) integrating multiple AD-related evidence was used to prioritize potential genes from DEG-enriched modules. Subsequently, we identified candidate genes in the potential genes list. Lastly, we combined deepDTnet and SAveRUNNER to predict interaction among candidate genes, drug and AD. Experiments on five datasets show that the CFG score of GJA1 is the highest among all potential driver genes of AD. Moreover, we found GJA1 interacts with AD from target-drugs-diseases network prediction. Therefore, candidate gene GJA1 is the most likely to be target of AD. In summary, identification of AD-related genes contributes to the understanding of AD pathophysiology and the development of new drugs.

Introduction

Alzheimer's disease (AD) is one of the most common neurodegenerative diseases, accounting for the majority of dementia patients (Wood, 2018; Darby et al., 2019). AD is estimated to affect in 13.8 million individuals in the United States (US), with 7.0 million being aged 85 years or older by 2050 (Alzheimer's Association, 2018; Cummings et al., 2019). Currently, genetic factor are believed to be partially responsible for AD (Xu et al., 2018). Genome-wide association studies (GWAS) have also revealed that some single nucleotide polymorphisms (SNPs) contribute to AD disease onset (Hao et al., 2019; Andrews et al., 2020). These include common variants such as amyloid protein precursor (APP), presenilin-1 (PSEN1), presenilin-2 (PSEN2) and apolipoprotein E (APOE). PSEN1, PSEN2 and APP genes are clear pathogenic genes of early-onset AD (Lanoiselée et al., 2017). APOE, as the only identified risk gene for late-onset AD, can increase the rate of cognitive decline (Wijsman et al., 2011). Different microRNAs (miRNAs) are also involved in the pathophysiology of AD (Femminella et al., 2015). For example, miRNA-377 promotes cell proliferation and inhibits cell apoptosis by regulating the expression level of cadherin 13 (CDH13), thus participating in the occurrence and development of AD (Liu et al., 2018). Long non-coding RNAs (lncRNAs) have been widely reported to be associated with a variety of physiological and pathological processes, such as AD. Brain cytoplasmic RNA is a kind of lncRNA, and the overexpression of brain cytoplasmic may lead to synaptic/dendritic degeneration in AD (Doxtater et al., 2020). Despite the fact that remarkable advances have been made in the understanding of the genetic basis of AD, there is no disease modifying therapy for AD. Identification of AD-related genes from transcriptomics becomes an attractive strategy for finding potential targets for drug therapy.

Gene expression profiling of transcriptomic datasets of AD and normal brain samples has identified potential genes and contributed to the search for potential targets (Patel et al., 2019). Correlation networks are often used to analyze gene expression data and gather biologically-relevant information from genes with similar co-expression patterns. At present, the two most commonly used gene co-expression network algorithms are SWItchMiner (SWIM) (Falcone et al., 2019) and Weighted Gene Correlation Network Analysis (WGCNA) (Nangraj et al., 2020; Ren et al., 2020). SWIM constructs an unweighted correlation network using local and global graph attributes to mine genes, known as switch genes, that have been shown to be associated with drastic changes in cell phenotypes, such as cancer development. WGCNA builds a correlation network that can be weighted or unweighted, and identifies related genes by measuring the centrality of a gene in the network. However, SWIM does not consider scale-free networks. The most notable characteristic of a scale-free network is the relative commonness of vertices with a degree that greatly exceeds the average. The highest-degree nodes are often referred to as "hubs" and are considered to have a specific purpose in their network. WGCNA is based solely on a scale-free network that is used to determine the relationships between genes, thereby enabling the identification of modules (clusters) of highly correlated genes, and the hub gene in each module. WGCNA is ideal for the identification of gene modules and key genes that contribute to phenotypic traits. Here, we used WGCNA to mine AD-specific modules from DEGs of AD and normal samples and identified candidate genes of from AD-specific modules.

Studying target-drug-disease network has contributed to the search for candidate genes of AD. In recent years, deep learning has been applied in biomedical and artificial intelligence fields, and many deep learning frameworks have been used to deal with the prediction problem of drug-target interaction (DTIs) (Xia et al., 2019). Öztürk et al. (2018) proposed a convolutional neural network (CNN)-based method based on using only sequence information and performing DTIs prediction on Davis and KIBA dataset. Rayhan et al. developed the FRnet-DTI, which is using autoenconder and CNN for feature extraction and classification, respectively (Chu et al., 2021). Zeng et al. (2020a) utilized cascade deep forest and arbitrary-order neighboring algorithms to predict DTIs. Zeng et al. (2020b) developed deepDTnet, a deep learning methodology for new target identification and drug repurposing in a heterogeneous network embedding 15 types of chemical, genomic, phenotypic, and cellular network profiles. Lots of works has been proposed for drug repurposing. Zeng et al. (2019) presented deepDR (deep learning-based drug repositioning), to systematically infer new drug-disease relationships for in silico drug repurposing. Fiscon et al. (2021) proposed SAveRUNNER, which predicts drug-disease associations by quantifying the interplay between the drug targets and the disease-specific proteins in the human interactome via a novel network-based similarity measure that prioritizes associations between drugs and diseases locating in the same network neighborhoods. Here, we combined deepDTnet and SAveRUNNER to predict interaction among candidate genes, drug and AD.

In this paper, we aimed to search potential driver genes for AD from DEGs based on multiple transcriptomics dataset. We hypothesized that the DEGs might be regulated by several candidate genes in the DEG-enriched coexpression modules/networks by WGCNA. We used CFG score as a measurement of the likelihood for candidate genes to be AD targets. Further, we combined deepDTnet and SAveRUNNER to predict interaction between candidate genes and AD based on gene-drug-disease network in Figure 1.

Figure 1

Figure 1

A flowchart of the whole study. (1) Data collection from AlzData and ADNI; (2) Data preprocessing (e.g., eliminating the samples with missing data); (3) DEGs regarded with |logFC| > 0.1 and FDR < 0.05; (4) Enrichment of biological process analyzed by DAVID 6.8; (5) Use WGCNA to find AD-specific module; (6) Prioritize driver genes of AD by CFG score; (7) candidate genes with CFG≥5 are identified. (8) Collect the dataset of target, drug and disease; (9) Combine deepDTnet and SAveRUNNER to predict association between candidate genes and AD.

Materials and Methods

AD Expression Data Collection and Preprocessing

Our dataset came from the AlzData and ADNI database. For AlzData, Xu et al. constructed new database AlzData (http://www.alzdata.org/) including, hippocampus (HP), entorhinal cortex (EC), frontal cortex (FC), and temporal cortex (TC). The original four microarray data come from Gene Expression Omnibus (GEO) (https:// www.ncbi.nlm.nih.gov/geo), by searching with the keyword “Alzheimer.” Data retrieval has been performed using the following series of criteria: 1) AD-related expression profiles in the ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/) were checked to avoid potential omissions; 2) Studies with no genome-wide probes or few probes were filtered; 3) For those GSE series with possibly duplicated samples or identical sample resource, we retained the one with a larger sample size and excluded another; 4) Only expression profiles of human postmortem brain tissues from HP, EC, FC, and TC, which were main regions affected by AD, were included; 5) Data retrieval and quality control were double-checked by two investigators. To ensure data quality, samples that were younger than 50 years old, or were outliers in our principal component analysis (PCA) of expression distribution, were excluded from this study.

For ADNI data (http://adni.loni.usc.edu), Gene expression profiling from peripheral blood samples collected using PAXgene tubes for RNA analysis was performed on the Affymetrix Human Genome U219 Array (www.affymetrix.com, Santa Clara, CA) for ADNI and on the Illumina Whole-Genome DASL assay (www.illumina.com, San Diego, CA) for AddNeuroMed and MCSA. All probe sets were mapped and annotated with reference to the human genome (hg19). Raw microarray expression values were pre-processed followed by standard quality control (QC) procedures on samples and probe sets. Briefly, raw expression values were pre-processed using the robust multi-chip average normalization method. We checked discrepancies between the reported sex and sex determined from sex-specific gene expression data including XIST and USP9Y and also evaluated whether SNP genotypes were matched with genotypes predicted from gene expression data.

In this study, we only consider gene expression data and binary classification problem (control vs. AD). After data processing, e.g., eliminating the samples with missing data, altogether, we have 467 controls and 309 AD from five dataset for subsequent analyses in total, including EC (39 vs. 39), HP (67 vs. 74), FC (128 vs. 104), TC (39 vs. 52) and ADNI (194 vs. 40). Detailed information of each dataset is shown in Table 1.

Table 1

DatasetAlzDataAlzheimer's disease neuroimaging initiative
Entorhinal cortexHippocampusFrontal cortexTemporal cortex
AbbreviationECHPFCTCADNI
No.of.gene1536116313117791546249387
Sample size(Control/AD)78 (39/39)141 (67/74)232 (128/104)91 (39/52)234 (194/40)
Age80 (29.6)81.7 (9.6)83 (9.4)81 (8.7)74.3 (6.5)
Male/Female/Unknown35/43/068/73/099/111/2232/41/18116/118/0
NANANANA1142.9 (494.9)
TauNANANANA25.4 (11.6)

Brief descriptions for five datasets.

These datasets come from AlzData and ADNI, respectively. Each dataset has multiple features. SDs are given in parentheses.

Statistical Analysis

Genes with log2 fold change greater than 0.1 (|logFC| > 0.1) and FDR smaller than 0.05 (FDR < 0.05) were defined as DEGs in AD patients in the each dataset. Functional enrichment of the DEGs was produced from Database for DAVID 6.8, which now provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes. For obtained list of DEGs, DAVID 6.8 is able to identify enriched biological themes, particularly KEGG pathway and GO terms (Huang et al., 2007). Differential expression analysis was conducted by R package limma and the Benjamini-Hochberg's method was used to correct for multiple comparisons (Xu et al., 2018).

Weighted Gene Co-expression Network Analysis

We used R package WGCNA to perform the weighted correlation network analysis. For genes i and j, the correlation coefficient is rij, we define the correlation intensity : , which depends on the choice of power β (the power value ranging from 1 to 20). When the independence is more than 0.80, the scale-free network is obtained by screening the appropriate power value. Finally, the adjacency matrix was transformed into topological overlap matrix (TOM). Once the network is built through the TOM, it is converted to a distance matrix (1-TOM) to use it as the basis for clustering. A dynamic tree-cutting algorithm is then applied to the dendrogram to generate a partition of disjunct sets of genes. In addition, we extracted the corresponding gene information of each module for further analysis (Bot́ıa et al., 2017).

deepDTnet and SAveRUNNER

In this study, we combined deepDTnet and SAveRUNNER to predict interaction between candidate genes and AD. deepDTnet and SAveRUNNER were applied to predict the interactions of candidate genes/targets and drugs and relationship drugs and diseases, respectively.

Firstly, deepDTnet uses stacked denoising autoencoder (SDAE) to obtain low-dimensional embedding for both drugs and targets. A SDAE model minimizes the regularized problem and tackles reconstruction error, defined as follows:

where x is input sample x(a vector); L is the number of layers, wl is weight matrix, and bl is bias vector of layer l∈{1, ., L}. λ is a regularization parameter and ||.||F denotes the Frobenius norm. The middle layer is the key that enables SDAE to reduce dimensionality and extract effective representations of side information.

Subsequently, Positive Unlabeled-matrix completion is used to predict unknown drug-target pairs. Assume the drug-target interaction matrix is given as , where Nd is the number of drugs and Nt is the number of targets. When Pij = 1, infers drug i is linked to target j while zero indicates the relationship is unobserved. The optimization problem of our model is parameterized as:

where the set Ω∈Nd × Nt is the observed entries from the true underlying matrix that includes both positive and negative entries, such that Ω = Ω+∪Ω, let Ω+ denotes the observed samples and Ωdenotes the missing entries chosen as negatives. Under the assumption that the matrix is modeled to be low rank, i.e., WNd × k and HNt × k, and these matrices share a low dimensional latent space, satisfying kNd, Nt. For biased inductive matrix completion, the value α is the key parameter, λ is a regularization parameter. Next, we approximate the likelihood of the pairwise interaction score between drug i and target j as:

where the higher score means a higher possibility that drug i is correlated with target j.

Then, to quantify the vicinity between drug and disease modules, SAveRUNNER implements a novel network similarity measure:

Where p is the network proximity measure defined: that represents the average shortest path length between drug targets t in the drug module T and the nearest disease genes s in the disease module S; QC is the quality cluster score; m is max(p); c and d are the steepness and the midpoint of f(p), respectively.

Finally, via deepDTnet and SAveRUNNER, we identified newly the relationship among candidate genes, drug and neurodegenerative diseases, which is including AD.

More detail about deepDTnet and SAveRUNNER could be found in previous study (Zeng et al., 2020b; Fiscon et al., 2021).

Convergent Functional Genomics

The potential driver genes was prioritized from AD-specific modules by CFG method, which integrated various levels of AD-related evidence (Ayalew et al., 2012; Xu et al., 2018). The range of CFG score was from 0 to 5, with 5 indicating highest priority. There were five AD-related evidence:1) Genetic association. If a gene had at least one locus being significantly associated with AD based on the summary statistics from the International Genomics of Alzheimer's Project [IGAP], 1 point was assigned; otherwise zero point. 2) Genetic regulation of gene expression. If a gene was associated with Expression Quantitative Trait Loci (eQTLs) showing an AD-risk in IGAP data, 1 point was assigned; otherwise zero point. 3) Protein-protein interaction. If a gene was physically interacted with any AD core genes (APP, PSEN1, PSEN2, APOE, or MAPT), 1 point was assigned; otherwise zero point. 4) Expression correlation with AD pathology. If the expression level of a gene was correlated with AD pathology in AD mice, 1 point was assigned; otherwise zero point. 5) Early alteration in AD mouse brain. If a gene showed differential expression in hippocampus of 2-month-old AD mice compared with age matched wild-type mice, 1 point was assigned; otherwise zero point.

Results

DEG Detection

A total of 776 samples and 108,302 genes from multiple transcriptomic datasets were compiled for DEGs detection. Besides, for ADNI dataset, we randomly chose 40 samples from the control in 10 times and selected gene with frequency greater than or equal to 3. Each red node represented DEG for five datasets in Figure 2. We identified 7,567 DEG(2166 EC, 1952 HP, 949 FC, 3075 TC and 3204 ADNI) for subsequent analyses. About 6 19% of the total genes could be identified as DEGs. Among the DEG list in all five datasets, the expression patterns of well-known AD risk genes, such as APP, PSEN1, PSEN2, APOE and MAPT were only slightly altered or unchanged in AD patients. In addition, 19 genes had a consistently differential expression from EC, HP, FC, TC and ADNI (Figure 3). We investigated functional enrichment of the AD-related DEGs. The 7,567 target genes in the network were enriched in 324 KEGG pathway and 1,381 GO terms in Figure 4. We identified 61 KEGG pathway and 324 GO terms (P< 0.005), respectively. As shown in Table 2, we also found several pathways have been reported to be associated with AD, including Alzheimer's disease pathway, MAPK signaling pathway and AMPK signaling pathway. Top 20 significantly KEGG pathway selected was exhibited for each dataset in Figure 5. Besides, these GO terms are divided into ontologies based on a hierarchical relations. Specifically, DEGs related to the biological processes for synaptic-related functions were significant enriched in Table 3, such as chemical synaptic transmission, regulation of postsynaptic membrane potential, synaptic vesicle exocytosis, synaptic transmission, GABAergic, regulation of synaptic transmission, glutamatergic, synaptic vesicle endocytosis and long-term synaptic potentiation. In addition, they were associated with neuron-related processes, including neurotransmitter secretion, neuron projection morphogenesis, negative regulation of neuron apoptotic process and negative regulation of neuron projection development.

Table 2

IDDescriptionIDDescription
hsa00020Citrate cycle (TCA cycle)hsa04966Collecting duct acid secretion
hsa00190Oxidative phosphorylationhsa05010Alzheimer's disease
hsa00260Glycine, serine and threonine metabolismhsa05012Parkinson's disease
hsa00620Pyruvate metabolismhsa05014Amyotrophic lateral sclerosis
hsa01200Carbon metabolismhsa05016Huntington disease
hsa012102-Oxocarboxylic acid metabolismhsa05017Spinocerebellar ataxia
hsa01230Biosynthesis of amino acidshsa05020Prion disease
hsa01522Endocrine resistancehsa05022Pathways of neurodegeneration - multiple diseases
hsa03050Proteasomehsa05032Morphine addiction
hsa04010MAPK signaling pathwayhsa05033Nicotine addiction
hsa04070Phosphatidylinositol signaling systemhsa05110Vibrio cholerae infection
hsa04071Sphingolipid signaling pathwayhsa05120Epithelial cell signaling in Helicobacter pylori infection
hsa04110Cell cyclehsa05131Shigellosis
hsa04120Ubiquitin mediated proteolysishsa05132Salmonella infection
hsa04137Mitophagy - animalhsa05140Leishmaniasis
hsa04140Autophagy - animalhsa05145Toxoplasmosis
hsa04144Endocytosishsa05152Tuberculosis
hsa04145Phagosomehsa05163Human cytomegalovirus infection
hsa04152AMPK signaling pathwayhsa05167Kaposi sarcoma-associated herpesvirus infection
hsa04211Longevity regulating pathwayhsa05169Epstein-Barr virus infection
hsa04218Cellular senescencehsa05202Transcriptional misregulation in cancer
hsa04260Cardiac muscle contractionhsa05205Proteoglycans in cancer
hsa04360Axon guidancehsa05212Pancreatic cancer
hsa04625C-type lectin receptor signaling pathwayhsa05214Glioma
hsa04666Fc gamma R-mediated phagocytosishsa05215Prostate cancer
hsa04721Synaptic vesicle cyclehsa05219Bladder cancer
hsa04722Neurotrophin signaling pathwayhsa05220Chronic myeloid leukemia
hsa04723Retrograde endocannabinoid signalinghsa05223Non-small cell lung cancer
hsa04920Adipocytokine signaling pathwayhsa05225Hepatocellular carcinoma
hsa04932Non-alcoholic fatty liver diseasehsa05235PD-L1 expression and PD-1 checkpoint pathway in cancer
hsa04961Endocrine and other factor-regulated calcium reabsorption

Significant KEGG pathways obtained from DAVID (P < 0.005).

Table 3

IDTerm
GO:0002223Stimulatory C-type lectin receptor signaling pathway
GO:0006888ER to Golgi vesicle-mediated transport
GO:0048015Phosphatidylinositol-mediated signaling
GO:0038128ERBB2 signaling pathway
GO:0007249I-kappaB kinase/NF-kappaB signaling
GO:0006672ceramide metabolic process
GO:0000165MAPK cascade
GO:0045944Positive regulation of transcription from RNA polymerase II promoter
GO:0007269Neurotransmitter secretion
GO:0035329Hippo signaling
GO:0006120Mitochondrial electron transport, NADH to ubiquinone
GO:0042776Mitochondrial ATP synthesis coupled proton transport
GO:0070125Mitochondrial translational elongation
GO:0032981Mitochondrial respiratory chain complex I assembly
GO:0007409Axonogenesis
GO:0048812Neuron projection morphogenesis
GO:0043524Negative regulation of neuron apoptotic process
GO:0007268Chemical synaptic transmission
GO:0060078Regulation of postsynaptic membrane potential
GO:0016079Synaptic vesicle exocytosis
GO:0048813Dendrite morphogenesis
GO:0090263Positive regulation of canonical Wnt signaling pathway
GO:0009967Positive regulation of signal transduction
GO:0051932Synaptic transmission, GABAergic
GO:0046034ATP metabolic process
GO:0070933Histone H4 deacetylation
GO:0007420Brain development
GO:0007417Central nervous system development
GO:0035357Peroxisome proliferator activated receptor signaling pathway
GO:0015986ATP synthesis coupled proton transport
GO:0040029Regulation of gene expression, epigenetic
GO:0007399Nervous system development
GO:0051966Regulation of synaptic transmission, glutamatergic
GO:0048488Synaptic vesicle endocytosis
GO:0010977Negative regulation of neuron projection development
GO:0060071Wnt signaling pathway, planar cell polarity pathway
GO:0006521Regulation of cellular amino acid metabolic process
GO:2000310Regulation of N-methyl-D-aspartate selective glutamate receptor activity
GO:0038061NIK/NF-kappaB signaling
GO:0035418Protein localization to synapse
GO:0060291Long-term synaptic potentiation

Significant GO terms obtained from DAVID (P < 0.005).

The first column is GO terms ID; the second column is the name of GO terms.

Figure 2

Figure 2

Enhanced Volcano for illustrating DEGs in all datasets. The gene with |logFC| > 0.1 and FDR < 0.05 as DEGs shown in red node. (A) EC, (B) HP, (C) FC, (D) TC and (E) ADNI. Note: in ADNI dataset, DEGs by counting the frequency of 3 or above out of 10 occurrences.

Figure 3

Figure 3

Venn diagram is used to represent relationships between EC (blue), HP (red), FC (green), TC (yellow) and ADNI (brown).

Figure 4

Figure 4

Venn diagram is used to represent relationships between multiple datasets. (A) KEGG pathway and (B) GO term.

Figure 5

Figure 5

Top 20 pathway of KEGG for five datasets (P < 0.005). (A) EC, (B) HP, (C) FC, (D) TC, and (E) ADNI.

We used WGCNA to divide the DEGs into several highly related gene modules. As shown in Figure 6, a very significant positive correlation was observed between five modules and AD for five dataset. A modular size was ranged from 96 to 142 genes that might reflect the different layers and complexity of gene regulation in the AD brain. These five AD-specific modules were used for identifying potential driver genes for AD etiology and pathology. We obtained potential driver genes from each AD-specific modules for every dataset. Finally, after removing the overlap genes, we have 602 candidate genes from 5 AD-specific modules in total, including EC (107), HP(140), FC(142), TC(136) and ADNI(96). We hypothesized that the higher the CFG score is, the more likely the candidate genes are to be AD targets. We chose 40 genes with CFG ≥ 4 for subsequent analyses.

Figure 6

Figure 6

Module-trait relationships for five datasets.Each row represents different gene co-expression modules, and each column represents different clinical phenotypes. Number represent correlation coefficients and P-values are in parenthesis. Correlation strength is represented by continuous color, with red being positive, blue being negative. (A) EC, (B) HP, (C) FC, (D) TC, and (E) ADNI.

Identification and Prioritization of Potential Driver Genes

The 40 potential driver genes are prioritized by the CFG method based on AlzData database, which is integrated various levels of AD-related data in Table 4. For each gene, we showed the eQLT, GWAS, PPI, Early_DEG, Pathology correlation and Tau (CFG ≥ 4), and CFG score. We found that several genes were validated by previous studies from literatures. For example, GJA1, also known as connexin 43, shows upregulated mRNA and protein levels in AD (Ren et al., 2018). Specific reductions of RPH3A immunoreactivity compared with aged controls. RPH3A loss correlated with dementia severity, cholinergic deafferentation, and increased concentrations. Furthermore, RPH3A expression is selectively downregulated in cultured neurons treated with 25–35 peptides (Tan et al., 2014). CASP6 activity is intimately associated with the pathologies that define AD, correlates well with lower cognitive performance in aged individuals, and is involved in axonal degeneration in several cellular and in vivo animal models (LeBlanc, 2013). The levels of angiotensinogen (AGT) is increased in the cerebrospinal fluid of patients with mild cognitive impairment and AD (Mateos et al., 2011). The stromal cell-derived factor 1 (SDF1), known as chemokine CXCL12, was a proinflammatory chemokine, highly expressed in the central nervous system. They may regulate synaptic transmission in excitability neurons and modulate neuroglial communication. CXCL12 was detected in plasma and hippocampus AD patients. Levels of this chemokine were considerably decreased compared to the control group (Dulewicz et al., 2020). In summary, combining WGCNA with CFG offer a useful tool to prioritize potential genes for AD.

Table 4

GeneAD-related evidenceCFG
eQTLGWASPPIEarly_DEGPathology cor
(Aβ)(Tau)
GJA122PSEN1, MAPT, APOEyes0.388**0.131ns5
FOXO110PSEN2yes0.270ns0.526*4
PRKX3NAPSEN1yes0.352*–0.023ns4
RPH3A52-yes–0.199ns–0.738**4
CASP650APP, PSEN1, PSEN2, MAPTyes0.482***0.738**4
CRMP113MAPTNA–0.304*–0.506ns4
RGS4132-yes–0.419**–0.579*4
NPTX211-yes–0.688***–0.783***4
RPS2710PSEN2yes0.503***0.662**4
MEGF1038-yes0.559***0.120ns4
AP2A110APP, PSEN2, MAPTyes–0.277ns–0.585*4
PITPNC1101-yes–0.128ns–0.638*4
AGT10APP, PSEN1, APOEyes–0.359*0.002ns4
AQP474-yes0.800***0.275ns4
MYT1L312-yes–0.488***–0.583*4
IQGAP110PSEN1yes0.310*0.282ns4
IGFBP780MAPT, APOEyes0.353*0.510ns4
CITED210APP, PSEN1, APOEyes–0.433**–0.772***4
SMAD1161APP, APOENA–0.332*–0.497ns4
CDH701PSEN1yes–0.345*–0.691**4
MSRB252-yes0.32*0.609*4
DBI11-yes0.780***0.718**4
PELI220PSEN2yes0.591***–0.107ns4
AVEN11-yes0.525***0.008ns4
F13A173APP, APOENA0.195ns0.623*4
SLA10PSEN1, MAPTyes0.114ns0.662**4
ADAMTS20217-yes0.085ns0.587*4
RARB62PSEN2yes–0.064ns–0.387ns4
SDC283PSEN1, PSEN2, MAPT, APOEyes0.041ns0.086ns4
DCN80APP, PSEN1, MAPT, APOEyes–0.416**0.546*4
CCR510APPyes0.769***0.616*4
GPRC5B241-yes0.307*–0.248ns4
IRF510APP, PSEN1, PSEN2, MAPT, APOEyes0.879***0.839***4
IGFBP780MAPT, APOEyes0.353*0.510ns4
CXCL1210APP, PSEN2, MAPT, APOEyes0.432**–0.069ns4
CREM10PSEN1, MAPT, APOEyes–0.439**–0.396ns4
EHHADH140MAPT, APOEyes0.438**–0.022ns4
SLC1A371-yes0.651***0.494ns4
VAV305MAPTyes0.319*–0.284ns4
IL15218-yes0.623***0.685**4

The 40 potential driver genes are prioritized by the CFG method based on AlzData database.

NA,” not applicable due to missing related data for the target gene. AD, Alzheimer's disease; CFG, convergent functional genomics score based on the total number of lines of AD-related evidence; DEG, differentially expressed gene; eQTL, the total number of risk SNPs based on the IGAP data setthat were able to regulate expression of the target gene; GWAS, the total number of risk SNPs within the target gene based on the IGAP data set; PPI, AD core genes (APP, PSEN1, PSEN2, MAPT, and APOE) that had a significant protein-protein interaction with the target genes; Early_DEG: target gene is differentially expressed in AD mouse models before AD pathology emergence; Expression correlation of the target gene and AD pathology in AD mice was performed for the Aβ line AD mice in Mouse (marked as Aβ) and the Tau line AD mice in Mouse (marked as Tau). *P < 0.05; **P < 0.01; ***P < 0.001.

Candidate Genes GJA1

As shown in Table 4, the CFG score of GJA1 is the highest among all potential genes and regarded as candidate gene. We combined deepDTnet and SAveRUNNER to search association between candidate genes GJA1 and AD based on target-drug-disease network. As shown in Figure 7, the network is constructed 13 drugs, a candidate genes GJA1 and neurodegenerative diseases. 11 newly drug-target interaction and 13 newly drug-disease association are identified by deepDTnet and SAveRUNNER, respectively. Especially, we found that dopamine were validated by previous studies from literatures. Dopamine, a compound of the catecholamine and phenethylamine families playing important roles in the human brain, was predicted by deepDR to be associated with AD. Such a prediction can be supported by a previous study indicating that lack of dopamine in the brain may cause some of the earliest symptoms of Alzheimer (Zeng et al., 2019). In AD, the dysfunction of dopaminergic transmission has been hypothesized as a new player in the pathophysiology of AD. Dopamine acts through five different types of receptors, generally distinct in two main subclasses: D1-like [comprising the dopamine 1 receptor (D1R) and the dopamine 5 receptor (D5R)]; and D2-like [comprising the dopamine 2 receptor (D2R), dopamine 3 receptor (D3R) and the dopamine 4 receptor (D4R)]. Pan et al. found that dopamine, D1R and D2R concentration levels were decreased in patients with AD compared with controls. Moreover, decreased levels of dopamine and D2-like receptors were linked with the pathophysiology of AD because of their strong higher rank correlations with AD (Pan et al., 2020). To conclude, candidate genes GJA1 is the most likely to be targets of AD.

Figure 7

Figure 7

Drug-GJA1-disease interaction network. The network contained candidate target GJA1 (green), Neurodegenerative Diseases (red) and 13 drugs (yellow).Gray indicate known interaction. Green and red lines and newly predicted interactions using deepDTnet and SAveRUNNER, respectively.

Discussion

Pathway enrichment analysis was performed to interpret the function of these DEGs. KEGG pathway analysis for the 7,567 DEGs were significantly enriched in one KEGG pathway “MAPK signaling pathway,” which is composed of ERK, P38, and JNK. In the adult nervous system, ERK activation is necessary for synaptic plasticity and memory formation (Du et al., 2019). In the brains of AD patients, P38 is highly expressed. -induced P38 activation increases tau phosphorylation and promotes the amyloidogenic processing of APP (Giraldo et al., 2014; Gourmaud et al., 2015). In a mouse model of AD, the JNK signaling pathway is overactivated in the spine before cognitive decline (Sclip et al., 2014). These studies indicate that the overactivation of MAPK signaling pathway could cause the occurrence of AD. Therefore, preventing MAPK overactivation is effective strategy in order to reduce deposition, Tau hyperphosphorylation, neuronal apoptosis, and memory impairment. MAPKs could be potential targets for novel and effective therapeutics of AD (Yenki et al., 2013; Feld et al., 2014).

GO term analysis indicated that the 7,567 DEGs were mainly involved in chemical synaptic transmission, regulation of postsynaptic membrane potential, synaptic vesicle exocytosis, synaptic transmission, GABAergic synapses, regulation of synaptic transmission, glutamatergic, synaptic vesicle endocytosis, long-term synaptic potentiation, neurotransmitter secretion, neuron projection morphogenesis, negative regulation of neuron apoptotic process and negative regulation of neuron projection development. Damage to neuronal and synaptic function has always been considered an important pathological feature of neurodegenerative diseases, and decreased synaptic activity is also considered to be the most relevant pathological feature of AD cognitive impairment (Wu et al., 2019). For example, the downregulation of GABAergic synapses is closely related to the loss of GABAergic inhibition (Kim et al., 2020). Studies have found that GABAergic neurotransmission is closely related to various aspects of AD pathology, including toxicity and Tau hyperphosphorylation (Kadoyama et al., 2021). The level of GABA inhibitory neurotransmitter in AD patients was significantly reduced, suggesting that AD has insufficient synaptic function and neuronal transmission (Schmitz et al., 2017). In addition, In a mouse model of AD indicate that the impairment of hippocampal neurogenesis may be mediated by GABAergic signal dysfunction or the imbalance between excitatory and inhibitory synapses (Sun et al., 2009). Therefore, GABAergic synapses not only plays an important role in the function of the hippocampus, but also in the pathogenesis of AD.

Limitations

There are some limitations in this study. First, although we identified 23 potential driver genes of AD by the WGCNA and CFG method, these approachs could be used to prioritize genes rather than to identify true causal genes. Therefore, further biological validation of the identified genes are necessary in future studies. Second, 4 of 5 datasets were downloaded from AlzData, which only retained the common genes from different studies during the cross-platform normalization. Third, the sample size of EC, HP and TC available for analyze was still limited, and the larger sample size of FC and ADNI might have a greater influence on the results. Fourth, the rapid development of various omics provide new opportunities for understanding of AD. However, we only used transcriptomics dataset to identify potential driver genes of AD. Finally, more potential genes of AD were not considered. Deep learning has capacity to dig out more hidden gene in data and is a machine learning algorithm based on artificial neural network, which is a computational model inspired by the structure of human brain. The main difference between deep learning and traditional artificial neural network lies in the scale and complexity of network structure. The networks of deep learning have a larger number of hidden layers, while traditional artificial neural networks usually have only one hidden layer. This is due to the lack of big data and GPU hardware technical support in the last century. Due to the emergence of more powerful CPU and GPU hardware, deep learning with more hidden layers is proposed on the basis of artificial neural network, and more nodes can be used in each hidden layer (Esteva et al., 2019; Zou et al., 2019).

Conclusions

In this study, we identified potential driver genes from AD-specific modules using multiple transcriptomics datasets and observed that DEGs were enriched with several pathways significantly by DAVID 6.8, which are consistent with observations from previous studies. Moreover, through studying of WGCNA, CFG and drug-target-disease network prediction, candidate gene GJA1 is the most likely to be targets of AD, actually reported in previous study. In summary, identification of AD-related genes contributes to the understanding of AD pathophysiology and the development of new drugs. In summary, Our results contribute to understanding pathophysiology of AD and looking for candidates drug targets.

Funding

This work was supported by China Postdoctoral Science Foundation (2020M671125) and start-up grant of the Shanghai Jiao Tong University (WF220408213).

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Statements

Data availability statement

The original contributions presented in the study are publicly available. This data can be found here: https://github.com/Macau-LYXia/Transcriptomics-Data-for-AD. Data used in the preparation of this article were obtained from the AlzData (http://www.alzdata.org/) and Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu).

Author contributions

L-YX and LT contributed to collect data sets and analyze data. L-YX, LT, HH, and JL contributed to the interpretation of the results and revised the manuscript. L-YX took the lead in writing the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  • 1

    Alzheimer's Association. (2018). 2018 Alzheimer's disease facts and figures. Alzheimers Dement. 14, 367429. 10.1016/j.jalz.2018.02.001

  • 2

    AndrewsS. J.Fulton-HowardB.GoateA. (2020). Interpretation of risk loci from genome-wide association studies of Alzheimer's disease. Lancet Neurol. 19, 326335. 10.1016/S1474-4422(19)30435-1

  • 3

    AyalewM.Le-NiculescuH.LeveyD.JainN.ChangalaB.PatelS.et al. (2012). Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction. Mol. Psychiatry17, 887905. 10.1038/mp.2012.37

  • 4

    BotíaJ. A.VandrovcovaJ.ForaboscoP.GuelfiS.D'SaK.HardyJ.et al. (2017). An additional k-means clustering step improves the biological features of wgcna gene co-expression networks. BMC Syst. Biol. 11, 116. 10.1186/s12918-017-0420-6

  • 5

    ChuY.ShanX.ChenT.JiangM.WangY.WangQ.et al. (2021). Dti-mlcd: predicting drug-target interactions using multi-label learning with community detection method. Brief. Bioinformatics 22, bbaa205. 10.1093/bib/bbaa205

  • 6

    CummingsJ.LeeG.RitterA.SabbaghM.ZhongK. (2019). Alzheimer's disease drug development pipeline: 2019. Alzheimers Dementia5, 272293. 10.1016/j.trci.2019.05.008

  • 7

    DarbyR. R.JoutsaJ.FoxM. D. (2019). Network localization of heterogeneous neuroimaging findings. Brain142, 7079. 10.1093/brain/awy292

  • 8

    DoxtaterK.TripathiM. K.KhanM. M. (2020). Recent advances on the role of long non-coding rnas in Alzheimer's disease. Neural Regenerat. Res. 15, 2253. 10.4103/1673-5374.284990

  • 9

    DuY.DuY.ZhangY.HuangZ.FuM.LiJ.et al. (2019). Mkp-1 reduces aβ generation and alleviates cognitive impairments in Alzheimer92s disease models. Signal Transduct. Targeted Therapy4, 112. 10.1038/s41392-019-0091-4

  • 10

    DulewiczM.Kulczynska-PrzybikA.BorawskiB.Klimkowicz-MrowiecA.PeraJ.SlowikA.et al. (2020). The cerebrospinal fluid stromal cell-derived factor 1 (cxcl12) concentration in Alzheimer's disease: biomarkers (non-neuroimaging)/novel biomarkers. Alzheimers Dementia16, e042573. 10.1002/alz.042573

  • 11

    EstevaA.RobicquetA.RamsundarB.KuleshovV.DePristoM.ChouK.et al. (2019). A guide to deep learning in healthcare. Nat. Med. 25, 2429. 10.1038/s41591-018-0316-z

  • 12

    FalconeR.ConteF.FisconG.PecceV.SponzielloM.DuranteC.et al. (2019). Braf v600e-mutant cancers display a variety of networks by swim analysis: Prediction of vemurafenib clinical response. Endocrine64, 406413. 10.1007/s12020-019-01890-4

  • 13

    FeldM.KrawczykM. C.Sol FustinanaM.BlakeM. G.BarattiC. M.RomanoA.et al. (2014). Decrease of erk/mapk overactivation in prefrontal cortex reverses early memory deficit in a mouse model of Alzheimer's disease. J. Alzheimers Dis. 40, 6982. 10.3233/JAD-131076

  • 14

    FemminellaG. D.FerraraN.RengoG. (2015). The emerging role of micrornas in Alzheimer's disease. Front. Physiol. 6, 40. 10.3389/fphys.2015.00040

  • 15

    FisconG.ConteF.FarinaL.PaciP. (2021). Saverunner: a network-based algorithm for drug repurposing and its application to covid-19. PLoS Comput. Biol. 17, e1008686. 10.1371/journal.pcbi.1008686

  • 16

    GiraldoE.LloretA.FuchsbergerT.Vi naJ. (2014). Aβ and tau toxicities in Alzheimer92s are linked via oxidative stress-induced p38 activation: protective role of vitamin e. Redox Biol. 2, 873877. 10.1016/j.redox.2014.03.002

  • 17

    GourmaudS.PaquetC.DumurgierJ.PaceC.BourasC.GrayF.et al. (2015). Increased levels of cerebrospinal fluid jnk3 associated with amyloid pathology: links to cognitive decline. J. Psychiatry Neurosci. 40, 151. 10.1503/jpn.140062

  • 18

    HaoS.WangR.ZhangY.ZhanH. (2019). Prediction of Alzheimer's disease-associated genes by integration of gwas summary data and expression data. Front. Genet. 9, 653. 10.3389/fgene.2018.00653

  • 19

    HuangD. W.ShermanB. T.TanQ.KirJ.LiuD.BryantD.et al. (2007). David bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(Suppl_2), W169W175. 10.1093/nar/gkm415

  • 20

    KadoyamaK.MatsuuraK.TakanoM.OtaniM.TomiyamaT.MoriH.et al. (2021). Proteomic analysis involved with synaptic plasticity improvement by gabaa receptor blockade in hippocampus of a mouse model of Alzheimer's disease. Neurosci. Res. 165, 6168. 10.1016/j.neures.2020.04.004

  • 21

    KimS.KimH.ParkD.KimJ.HongJ.KimJ. S.et al. (2020). Loss of iqsec3 disrupts gabaergic synapse maintenance and decreases somatostatin expression in the hippocampus. Cell Rep. 30, 19952005. 10.1016/j.celrep.2020.01.053

  • 22

    LanoiseléeH.-M.NicolasG.WallonD.Rovelet-LecruxA.LacourM.RousseauS.et al. (2017). App, psen1, and psen2 mutations in early-onset Alzheimer disease: a genetic screening study of familial and sporadic cases. PLoS Med. 14, e1002270. 10.1371/journal.pmed.1002270

  • 23

    LeBlancA. C. (2013). Caspase-6 as a novel early target in the treatment of Alzheimer's disease. European Journal of Neuroscience37, 20052018. 10.1111/ejn.12250

  • 24

    LiuF.ZhangZ.ChenW.GuH.YanQ. (2018). Regulatory mechanism of microrna-377 on cdh13 expression in the cell model of Alzheimer's disease. Eur. Rev. Med. Pharmacol. Sci22, 28012808. 10.26355/eurrev_201805_14979

  • 25

    MateosL.IsmailM.-A.-M.Gil-BeaF.-J.LeoniV.WinbladB.BjörkhemI.et al. (2011). Upregulation of brain renin angiotensin system by 27-hydroxycholesterol in Alzheimer's disease. J. Alzheimers Dis. 24, 669679. 10.3233/JAD-2011-101512

  • 26

    NangrajA. S.SelvarajG.KaliamurthiS.KaushikA. C.ChoW. C.WeiD. Q. (2020). Integrated ppi-and wgcna-retrieval of hub gene signatures shared between barrett's esophagus and esophageal adenocarcinoma. Front. Pharmacol. 11, 881. 10.3389/fphar.2020.00881

  • 27

    ÖztürkH.ÖzgürA.OzkirimliE. (2018). Deepdta: deep drug-target binding affinity prediction. Bioinformatics34, i821i829. 10.1093/bioinformatics/bty593

  • 28

    PanX.KamingaA. C.JiaP.WenS. W.AcheampongK.LiuA. (2020). Catecholamines in alzheimer's disease: a systematic review and meta-analysis. Front. Aging Neurosci. 12, 184. 10.3389/fnagi.2020.00184

  • 29

    PatelH.DobsonR. J.NewhouseS. J. (2019). A meta-analysis of Alzheimer's disease brain transcriptomic data. J. Alzheimers Dis. 68, 16351656. 10.3233/JAD-181085

  • 30

    RenR.ZhangL.WangM. (2018). Specific deletion connexin43 in astrocyte ameliorates cognitive dysfunction in app/ps1 mice. Life Sci. 208, 175191. 10.1016/j.lfs.2018.07.033

  • 31

    RenZ.-H.ShangG.-P.WuK.HuC.-Y.JiT. (2020). Wgcna co-expression network analysis reveals ilf3-as1 functions as a cerna to regulate ptbp1 expression by sponging mir-29a in gastric cancer. Front. Genet. 11, 39. 10.3389/fgene.2020.00039

  • 32

    SchmitzT. W.CorreiaM. M.FerreiraC. S.PrescotA. P.AndersonM. C. (2017). Hippocampal gaba enables inhibitory control over unwanted thoughts. Nat. Commun. 8, 112. 10.1038/s41467-017-00956-z

  • 33

    SclipA.TozziA.AbazaA.CardinettiD.ColomboI.CalabresiP.et al. (2014). c-jun n-terminal kinase has a key role in Alzheimer disease synaptic dysfunction in vivo. Cell Death Dis. 5, e1019e1019. 10.1038/cddis.2013.559

  • 34

    SunB.HalabiskyB.ZhouY.PalopJ. J.YuG.MuckeL.et al. (2009). Imbalance between gabaergic and glutamatergic transmission impairs adult neurogenesis in an animal model of Alzheimer's disease. Cell Stem Cell5, 624633. 10.1016/j.stem.2009.10.003

  • 35

    TanM. G.LeeC.LeeJ. H.FrancisP. T.WilliamsR. J.RamírezM. J.et al. (2014). Decreased rabphilin 3a immunoreactivity in Alzheimer's disease is associated with aβ burden. Neurochem. Int. 64, 2936. 10.1016/j.neuint.2013.10.013

  • 36

    WijsmanE. M.PankratzN. D.ChoiY.RothsteinJ. H.FaberK. M.ChengR.et al. (2011). Genome-wide association of familial late-onset Alzheimer's disease replicates bin1 and clu and nominates cugbp2 in interaction with apoe. PLoS Genet7, e1001308. 10.1371/journal.pgen.1001308

  • 37

    WoodI. C. (2018). The contribution and therapeutic potential of epigenetic modifications in Alzheimer's disease. Front. Neurosci. 12, 649. 10.3389/fnins.2018.00649

  • 38

    WuM.FangK.WangW.LinW.GuoL.WangJ. (2019). Identification of key genes and pathways for Alzheimer's disease via combined analysis of genome-wide expression profiling in the hippocampus. Biophysics Reports5, 98109. 10.1007/s41048-019-0086-2

  • 39

    XiaL.-Y.YangZ.-Y.ZhangH.LiangY. (2019). Improved prediction of drug-target interactions using self-paced learning with collaborative matrix factorization. J. Chem. Inf. Model. 59, 33403351. 10.1021/acs.jcim.9b00408

  • 40

    XuM.ZhangD.-F.LuoR.WuY.ZhouH.KongL.-L.et al. (2018). A systematic integrated analysis of brain expression profiles reveals yap1 and other prioritized hub genes as important upstream regulators in Alzheimer's disease. Alzheimers Dementia14, 215229. 10.1016/j.jalz.2017.08.012

  • 41

    YenkiP.KhodagholiF.ShaerzadehF. (2013). Inhibition of phosphorylation of jnk suppresses aβ-induced er stress and upregulates prosurvival mitochondrial proteins in rat hippocampus. J.Mol. Neurosci. 49, 262269. 10.1007/s12031-012-9837-y

  • 42

    ZengX.ZhuS.HouY.ZhangP.LiL.LiJ.et al. (2020a). Network-based prediction of drug-target interactions using an arbitrary-order proximity embedded deep forest. Bioinformatics36, 28052812. 10.1093/bioinformatics/btaa010

  • 43

    ZengX.ZhuS.LiuX.ZhouY.NussinovR.ChengF. (2019). deepdr: a network-based deep learning approach to in silico drug repositioning. Bioinformatics35, 51915198. 10.1093/bioinformatics/btz418

  • 44

    ZengX.ZhuS.LuW.LiuZ.HuangJ.ZhouY.et al. (2020b). Target identification among known drugs by deep learning from heterogeneous networks. Chem. Sci. 11, 17751797. 10.1039/C9SC04336E

  • 45

    ZouJ.HussM.AbidA.MohammadiP.TorkamaniA.TelentiA. (2019). A primer on deep learning in genomics. Nat. Genet. 51, 1218. 10.1038/s41588-018-0295-5

Summary

Keywords

Alzheimer's disease, transcriptomics, drug repurposing, deep learning, drug-target interaction

Citation

Xia L-Y, Tang L, Huang H and Luo J (2022) Identification of Potential Driver Genes and Pathways Based on Transcriptomics Data in Alzheimer's Disease. Front. Aging Neurosci. 14:752858. doi: 10.3389/fnagi.2022.752858

Received

03 August 2021

Accepted

21 February 2022

Published

18 March 2022

Volume

14 - 2022

Edited by

Chih-Yu Hsu, Fujian University of Technology, China

Reviewed by

Lorenzo Farina, Sapienza University of Rome, Italy; Christoph Preuss, Jackson Laboratory, United States

Updates

Copyright

*Correspondence: Jie Luo

This article was submitted to Alzheimer's Disease and Related Dementias, a section of the journal Frontiers in Aging Neuroscience

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics