Exploring the Key Genes and Identification of Potential Diagnosis Biomarkers in Alzheimer’s Disease Using Bioinformatics Analysis

Background Alzheimer’s disease (AD) is one of the major threats of the twenty-first century and lacks available therapy. Identification of novel molecular markers for diagnosis and treatment of AD is urgently demanded, and genetic biomarkers show potential prospects. Method We identify and intersected differentially expressed genes (DEGs) from five microarray datasets to detect consensus DEGs. Based on these DEGs, we conducted Gene Ontology (GO), performed the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, constructed a protein—protein interaction (PPI) network, and utilized Cytoscape to identify hub genes. The least absolute shrinkage and selection operator (LASSO) logistic regression was applied to identify potential diagnostic biomarkers. Gene set enrichment analysis (GSEA) was performed to investigate the biological functions of the key genes. Result We identified 608 consensus DEGs, several dysregulated pathways, and 18 hub genes. Sixteen hub genes dysregulated as AD progressed. The diagnostic model of 35 genes was constructed, which has a high area under the curve (AUC) value in both the validation dataset and combined dataset (AUC = 0.992 and AUC = 0.985, respectively). The model can also differentiate mild cognitive impairment and AD patients from controls in two blood datasets. Brain-derived neurotrophic factor (BDNF) and WW domain-containing transcription regulator protein 1 (WWTR1), which are associated with the Braak stage, Aβ 42 levels, and β-secretase activity, were identified as critical genes of AD. Conclusion Our study identified 16 hub genes correlated to the neuropathological stage and 35 potential biomarkers for the diagnosis of AD. WWTR1 were identified as candidate genes for future studies. This study deepens our understanding of the transcriptomic and functional features and provides new potential diagnostic biomarkers and therapeutic targets for AD.


INTRODUCTION
Alzheimer's disease (AD) is the most common neurodegenerative disease in the elderly, affecting more than 35.6 million people worldwide (Querfurth and LaFerla, 2010;Kumar et al., 2015). Epidemiological analysis has predicted that the number will rise to 65.7 million in 2030 and approximately 115.4 million in 2050 (Prince et al., 2013). The symptoms usually start with subtle memory loss and gradually progress to affect other cognitive domains as the condition deteriorates, such as language, visuospatial skills, motor skills, executive function, and activities of daily living (McKhann et al., 2011;Huang and Mucke, 2012). As AD usually has concealed onset, most patients with AD are already at an advanced stage at the time of diagnosis. Furthermore, the long-term care and related costs of AD contribute a substantial economic burden to the society and family. It is reported that the global societal cost for dementia is projected to grow to approximately $2 trillion in 2030 (Wimo et al., 2017). Unfortunately, despite recent progress in understanding the neurobiology and pathophysiology of AD so far, no therapeutic strategies can effectively prevent or cure AD. Therefore, research directed toward identifying AD biomarkers is needed for the early diagnosis, prevention, and treatment of AD.
AD is a characteristic "complex" disease resulting from the interaction of genetic and environmental factors. It is known that the primary pathogenesis of AD was β-amyloid (Aβ) abnormal deposition, neurofibrillary tangles induced by phosphorylation of tau proteins, inflammatory response, oxidative stress, and neuronal apoptosis. All these processes involve alterations in the expression and regulation of numerous genes. Studies suggest that genetic factors are estimated to attribute up to 79% to the risk for AD (Wingo et al., 2012). The apolipoprotein E (APOE) ε4 allele has been identified as the most substantial risk factor for AD Sherrington et al., 1995). Mutations in the genes which enhanced generation and aggregation of Aβ, such as amyloid precursor protein (APP), presenilin (1) (PSEN1), and presenilin (2) (PSEN2), were included in the established genetic causes of familial AD (Sorbi et al., 2001;Tanzi and Bertram, 2005). Moreover, genetic analyses have suggested that the individual differences and complicated pathogenesis of AD may be influenced by multiple genes and their variants involved in numerous biological functions and substantially increase the risk of the disease (Hokama et al., 2014;Stopa et al., 2018). Therefore, identification and comprehensive analyses of potential candidate genes will considerably increase our understanding of the biological mechanisms involved in disease pathogenesis and could potentially be used as diagnostic or predictive biomarkers for AD.
In recent years, bioinformatics analysis is widely applied in molecular biology experiments and clinical practice (Banwait and Bastola, 2015), revealing the key pathways and drug targets in complex diseases . Thus, joint analysis of the array-based data of AD may be a novel analytical strategy. Our present study aims to reveal the transcriptomic characteristics and identification of novel biomarkers of AD for diagnosis and treatment. We identified co-differentially expressed genes (DEGs) in AD of five microarray datasets in the Gene Expression Omnibus (GEO). Based on the results, we performed a series of analyses, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment pathway analysis, protein-protein interaction (PPI) analysis, and least absolute shrinkage and selection operator (LASSO) logistic regression analysis. We identified 18 hub genes and tested their expression levels in different Braak stages. A 35-gene-based diagnosis model was constructed, and then we test the diagnostic values for AD and mild cognitive impairment (MCI). Finally, two key genes were identified by overlapping the 18 hub genes and 35 diagnosis genes. We further explored their correlations with β-secretase activity and Aβ 42 levels. Gene set enrichment analysis (GSEA) was used to explore the potential biological functions of hub genes. Our present study could provide more insights into the molecular mechanism of AD and provided potential biomarker candidates for clinical diagnosis and treatment.

Data Processing
GEO 1 is a public functional genomics data repository of high-throughput gene expression data, chips, and microarrays. According to the following criteria, datasets were considered eligible for our analysis: (1) datasets with AD samples; (2) datasets supported by peer-reviewed PubMed-indexed publications; and (3) studies with information about the technology and platform utilized for studies. We selected 10 datasets (GSE33000, GSE36980, GSE48350, GSE5281, GSE122063, GSE106241, GSE4226, GSE97760, GSE63060, and GSE63061) related to AD for analysis. A total of 757 non-demented healthy control subjects (NDHCS) and 932 AD patients were analyzed. We extracted the whole data of a single study, including all brain regions, for analysis. The data sample collection is shown in Table 1. The flowchart of the study is illustrated in Figure 1.
The GSE33000 (platform GPL4372) was composed of postmortem prefrontal cortex (PFC) samples of 157 NDHCS and 310 AD patients with matched genotype and clinical data. The GSE36980 (platform GPL6244) was composed of the frontal cortex (FC), temporal cortex (TC), and hippocampus (HPC) from 47 NDHCS and 32 AD patients. From GSE48350 (platform GPL570), we selected data from the HPC, entorhinal cortex (EC), superior frontal cortex (SFC), and post-central gyrus (PCGY) derived from 253 postmortem brains, among which 80 cases were diagnosed as having AD. The GSE122063 (platform GPL16699) was composed of tissues collected from FC and TC, 44 tissues from NDHCS and 56 from AD patients. The GSE5281 (platform GPL570) was composed of tissues collected by laser capture microscopy (LCM) from 74 NDHCS and 87 AD patients. The brain regions included the EC, HPC, medial temporal gyrus, posterior cingulate, SFC, and primary visual cortex. The GSE106241 was enrolled for independent external validation. In The first five datasets were used for combined analysis, GSE106241 was used for independent validation analysis, and the last four datasets were used for evaluating the diagnosis model in peripheral blood. AD, Alzheimer's disease; MCI, mild cognitive impairment; NDHCS, non-demented healthy control subjects. GPL, Gene Expression Omnibus Platform.
this dataset, 60 human temporal cortical tissue samples were included and divided into seven groups based on Braak staging. Gene expression profiles of peripheral blood were obtained from GSE97760, GSE4226, GSE63060, and GSE63061. Study subjects from GSE97760 (platform GPL16699) were all female, including nine subjects with advanced AD and 10 age-matched NDHCS. The GSE4226 (platform GPL1211) was composed of peripheral blood mononuclear cells from 14 NDHCS and 14 AD patients. Datasets GSE63060 and GSE63061 were composed of MCI patients, AD patients, and NDHCS. There are 329 samples (145 AD, 80 MCI, and 104 NDHCS) in GSE63060 and 382 samples (139 AD, 109 MCI, 134 NDHCS) in GSE63061. Three borderline MCI samples, one NDHCS-to-AD sample, one MCI-to-NDHCS sample, and one other sample, were excluded from GSE63061.

Identification of Consensus DEGs
As single datasets and few samples may weaken the credibility of the results, data integration is necessary to look for findings supported by several pieces of evidence and investigate the complex genetic mechanisms (Pineda et al., 2015). Therefore, five brain tissue datasets (GSE33000, GSE36980, GSE48350, GSE5281, and GSE122063) were selected to identify consensus DEGs. We used the impute package to supplement missing data (Troyanskaya et al., 2001). Then, the normalizeBetweenArrays function in the limma package was used to normalize gene expression. Next, we performed the differential analysis in each of the datasets. We screened DEGs by comparing AD tissues to NDHCS tissues in the R computing environment using the limma package (Ritchie et al., 2015). DEGs were determined by | Log 2 FC| > 0, adjusted p-value < 0.05. Volcano plots were generated using ggplot 2 in R. In order to obtain a consensus of DEGs, Venn analysis was performed using Draw Venn Diagram, a Web-based tool, 2 to identify common DEGs from the five datasets. The heatmap of the consensus DEGs was drawn using the R pheatmap package. We performed the batch correction, followed by normalization between arrays to remove the heterogeneity among multiple microarray datasets using sva and limma packages (Leek et al., 2012). Finally, principal component analysis (PCA) was performed to compare the difference of consensus DEGs between AD and NDHCS groups in different brain regions.

GO Enrichment and KEGG Pathway Analysis of the DEGs
GO enrichment analyses were performed in R using the function of clusterProfiler14. Metascape 3 was used to perform the KEGG pathway analysis. Functional and pathway enrichment analyses were conducted separately for upregulated and downregulated genes. In this analysis, a p-value < 0.05 was considered significant for the screening of significant GO terms and KEGG pathways. Furthermore, we performed the differential analysis separately in 10 brain regions. DEGs were determined by | Log 2 FC| > 0, adjusted p-value < 0.05. The top 100 upregulated DEGs and top 100 downregulated DEGs of each brain region were used for GO enrichment analyses. Finally, we take the intersection of

PPI Network Construction, Hub Gene Selection, and Hub Gene Expression in Different Braak Stages of AD
To further explore the interactions among DEGs, PPI network analysis was performed using the online database STRING with an interaction score of 0.4 as the threshold. Next, we utilized Cytoscape (version 3.7.1) to construct and visualize the main regulatory network. We used cytoHubba, a plugin of Cytoscape, to select the hub genes in the PPI network. Five methods (degree, maximum neighborhood component (MNC), radiality centrality, stress centrality, closeness centrality) were used to sequence and evaluate central genes (Chin et al., 2014). The five ranked methods selected the top 20 hub genes. Venn analysis was performed to identify central genes by overlapping the top 20 genes. Finally, we validate hub gene expressions in different Braak stages using the HPC and superior frontal cortex (SPF) samples in the dataset GSE48350. Overall differences between groups were tested with the Kruskal-Wallis (K-W) test, and differences between groups were compared by the Wilcox test. The boxplot was drawn by R package ggplot 2. The top 10 hub genes were also further evaluated based on the difference in the gene expression of GSE48350.

Identification of Potential Biomarkers of AD Using LASSO Logistic Regression
The LASSO, a penalized shrunken regression method, has a strong predictive value and low correlation and is applied to select the best features for high-dimensional data. The samples from five brain tissue datasets were randomly assigned to the training set (30%) and validation set (70%). The expression profiles of consensus DEGs were extracted and fit into LASSO logistic regression by the glmnet package. In order to evaluate the ability of the LASSO model to identify AD, receiver operating characteristic (ROC) analysis was completed using the package of pROC in the validation set and combined set (Robin et al., 2011). The area under the curve (AUC) was calculated, and AUC values close to 1 (AUC > 0.7) refer to good classifier models. We also investigated the diagnosis effect of the top 10 hub genes in the combined set.

Evaluate the Diagnosis Model in Peripheral Blood Datasets
As it is hard to obtain brain tissues for diagnosis in clinical practice, we attempted to enroll independent peripheral blood datasets to evaluate the clinical utility of our diagnosis model. We performed ROC analyses in GSE4226 and GSE97760 to examine the ability to differentiate AD from NDHCS. The datasets GSE63060 and GSE63061 were used to verify the accuracy of the model to differentiate MCI and AD from NDHCS. ROC curves were plotted using the "pROC" package.

GSEA and Independent Validation Analysis
GSEA was performed to identify biological process (BP) GO terms of the top 10 hub genes that may be correlated to AD in GSE48350 datasets. We performed GSEA using the R package clusterProfiler for analysis. The c5.bp.v 7.0.symbols.gmt datasets in the MsigDB V 6.2 database 4 were used as reference gene sets, and those with an adjusted p-value < 0.05 after 1,000 permutations were considered significantly enriched gene sets (Subramanian et al., 2005). We determined the key genes by overlapping the hub genes selected from the PPI network and potential diagnosis genes identified from the LASSO regression model. Next, we enrolled another independent dataset (GSE106241) and compared the expression level of key genes in different Braak stages using the K-W test. Moreover, we investigated their associations with β-secretase activity and Aβ 42 levels in AD samples from GSE106241 using the Spearman correlation analysis. p-values less than 0.05 (p < 0.05) were considered significant. The violin plots and correlation analysis in this section were all generated in R 3.6.3.

Identification of Consensus DEGs
Five microarray datasets, including GSE33000, GSE36980, GSE48350, GSE5281, and GSE122063, were downloaded from the National Center of Biotechnology Information-GEO (NCBI-GEO). Details of the five datasets are presented in Table 1. A total of 1,060 samples (495 NDHCS subjects and 565 AD patients) were available for DEG analysis, including microarray data from 10 brain regions: the EC, FC, HPC, medial temporal gyrus, PCGY, posterior cingulate, PFC, primary visual cortex, SFC, and TC. After background correction and normalization, we used the limma package to identify DEGs between NDHCS and AD samples of GEO data. Gene difference analysis found that there were 19,206 DEGs in GSE33000, 3,220 DEGs in GSE36980,8,134 DEGs in GSE48350,7,587 DEGs in GSE5281,and 12,207 DEGs in GSE122063 compared with AD patients and NDHCS (Supplementary Table 1). Volcano plots in Figure 2A show the number of DEGs identified from each of the five datasets. Subsequently, we intersected these DEGs from the five datasets and finally identified 608 common DEGs, of which 179 DEGs were upregulated and 429 DEGs were downregulated ( Figure 2B and Supplementary Table 1). To compare the DEGs between the AD and NDHCS groups, the heatmap showed the expression of common DEGs from five datasets (Supplementary Figure 1). PCA revealed that the expression of common DEGs differed significantly between NDHCS and AD samples in each brain region, indicating that the DEGs we found were common core genes in AD (Supplementary Figure 2).

GO Enrichment and KEGG Pathway Analysis of the DEGs
We performed GO term, KEGG pathway, and functional enrichment analyses to explore the potential biological functions of the common DEGs. The GO annotation results include BP, molecular function (MF), and cellular component (CC). The results revealed that the BP primarily associated with the upregulated genes, including the regulation of the nuclear-transcribed mRNA catabolic process, bleb assembly, detoxification of copper ion, stress response to copper ion, detoxification of inorganic compound, and stress response to metal ion. For CC enrichment analysis, the results showed that upregulated genes significantly took part in the focal adhesion, cell-substrate adherens junction, cell-substrate junction, and cell-cell junction. For MF analysis, upregulated genes are mainly enriched in cell adhesion molecule binding, cadherin binding, molecular adaptor activity, and transcription corepressor activity ( Figure 3A). For downregulated genes, regulation of membrane potential, modulation of chemical synaptic transmission, regulation of transsynaptic signaling, and neurotransmitter transport were dominant BPs. For CC enrichment analysis, downregulated genes mainly take part in presynapse, synaptic membrane, axon part, and glutamatergic synapse. In the enrichment analysis of MF, the downregulated genes mainly revolved in active transmembrane transporter activity, P-Pbond-hydrolysis-driven transmembrane transporter activity, and primary active transmembrane transporter activity ( Figure 3B). The KEGG pathway analysis showed that the upregulated genes were significantly enriched in the Hippo signaling pathway, regulation of actin cytoskeleton, adherens junction, mineral absorption, MAPK signaling pathway, and TGF-beta signaling pathway, while the downregulated DEGs were mainly enriched in the synaptic vesicle cycle, citrate cycle (TCA cycle), Parkinson's FIGURE 5 | Hub gene expression in different Braak stages of AD. The upper, middle, and lower horizontal lines of the box represent the upper, median, and lower quartiles, respectively. Overall differences between groups were tested with the K-W test. Asterisks indicate significant vs. Braak 0 groups; *p < 0.05; **p < 0.01; ***p < 0.001 (dataset GSE48350; n = 91 for NDHCS; n = 2 for Braak stages I and II; n = 15 for Braak stages III and IV; n = 21 for Braak stages V and VI). AD, Alzheimer's disease; NDHCS, non-demented healthy control subjects; HPC, hippocampus; PFC, prefrontal cortex. disease, lysosome, MAPK signaling pathway, and cholinergic synapse. The KEGG pathway enrichment analysis results are illustrated in Figures 3C,D

Specific Dysregulated Pathways for Each Brain Region
We performed GO enrichment analyses using the top 100 upregulated DEGs and top 100 downregulated DEGs of each brain region. Then we identified the specific dysregulated pathways of each brain region. There were two specific pathways in the EC and the PCGY, 20 pathways in the FC, 12 pathways in the HPC, 145 pathways in the medial temporal gyrus, 38 pathways in the posterior cingulate, 47 pathways in the primary visual cortex, four pathways in the TC, and 58 pathways in the PFC (Supplementary Table 2). There was no specific pathway in the SFC. The HPC was associated with the neuron projection organization. The medial temporal gyrus was associated with neuron projection maintenance, neurotransmitter receptor transport to the plasma membrane, neurotransmitter receptor transport to the postsynaptic membrane, and response to Aβ. The dysregulated genes in the posterior cingulate were significantly enriched in pathways, including the branching morphogenesis of a nerve, glutamate metabolic process, positive regulation of synaptic transmission, and glutamatergic pathway. In the primary visual cortex, the dysregulated genes were involved in neuromuscular synaptic transmission, neuroinflammatory response, and cell aging. The neurotransmitter reuptake was associated with TC. Peripheral nervous system development and nerve development were significantly enriched pathways in PFC.

PPI Network Construction, Hub Gene Selection, and Hub Gene Expression in Different Braak Stages of AD
The upregulated and downregulated DEGs were uploaded into the online tool STRING 5 to gain PPI information separately. Based on the degree of connectivity, we constructed the PPI network and selected the top 50 hub genes, and the result was visualized by Cytoscape ( Figure 4A). Next, we used cytoHubba to choose hub genes. According to the five classification methods in cytoHubba, we selected the top 20 hub genes, as shown in Supplementary Table 3. Finally, six upregulated and 12 downregulated central genes were identified by overlapping the first 20 genes (Figure 4B and Supplementary Table 3). Then, we used samples in the dataset GSE48350 to explore whether the 5 http://string-db.org expression levels of these central genes varied in different Braak stages in the HPC and SPF (Supplementary Table 4). We found that five hub genes were upregulated and 11 hub genes were downregulated as AD progressed (p < 0.05). Predominantly, SMAD 4 and YAP 1 were significantly upregulated in Braak III and IV and Braak V and VI compared with Braak 0. Brain-derived neurotrophic factor (BDNF), PSMD14, SLC32A1, SNAP25, and SYP were identified to be downregulated in Braak III and IV and Braak V and VI compared with Braak 0 (Figure 5).

Identification of Potential Biomarkers of AD Using LASSO Logistic Regression
To identify potential biomarkers for AD, we extracted the expression profile of the DEGs and fit them into LASSO logistic regression. We separated all samples (565 AD samples and 495 NDHCS) into training and validation cohorts (Supplementary Table 5). Thirty-five potential predictors in the training cohort were identified and were features with nonzero coefficients in the LASSO logistic regression model (Figures 6A,B and Table 2). Next, we evaluated the ability of the LASSO regression model in differentiating between AD and NDHCS, suggesting that the AUC of the 35-gene-based model was 0.992 in the validation set and 0.985 in the combined set ( Figure 6C). The results indicate that our 35-gene-based diagnosis model can correctly classify AD samples and NDHCS in brain tissues. The diagnosis effect of the top 10 hub genes was also investigated and presented in Supplementary Figure 3. Only BDNF had the ability to differentiate AD from NDHCS (AUC = 0.703).

Evaluation of the Diagnosis Model in Peripheral Blood Datasets
To further discover whether this model is worth using in clinical practice, we test our diagnosis model on four independent peripheral blood datasets (Supplementary Table 5). In the GSE4226, composed of peripheral blood mononuclear cells, only 16 genes of the 35-gene-based model were covered. The ROC analysis was also conducted based on the 16 genes, and the AUC was 0.871 ( Figure 7A). We applied the 35-gene-based model to GSE97760, which included patients with advanced AD and NDHCS and showed the perfect discrimination ability ( Figure 7B, AUC = 1.000). We also found that our diagnosis model had the ability to differentiate MCI and AD from NDHCS in the blood datasets. In GSE63060 (30 genes of the 35-genebased model were covered), the AUC for differentiating MCI and NDHCS is 0.922, and the AUC for AD and NDHCS is 0.837 ( Figure 7C). In GSE63061 (28 genes of the 35-gene-based model were covered), AUC for MCI and NDHCS is 0.763, and AUC for AD and NDHCS is 0.802 ( Figure 7D). Conclusively, the results indicate that our 35-gene-based diagnosis model can classify AD and MCI from NDHCS in peripheral blood.

GSEA and Independent Validation Analysis
To gain new insights into the biological functions of the top 10 genes, we performed GSEA to identify the potential BPs between AD and NDHCS subjects. As indicated in Figures 8A,B, the BDNF was associated with cognition, HPC development, neuron death, regulation of neuronal synaptic plasticity, regulation of neurotransmitter levels, and transport. BPs such as neuroblast proliferation, neuroepithelial cell differentiation, neurotransmitter biosynthetic process and metabolic process, neuroinflammatory response, and regulation of neuroinflammatory response were associated with WW domain-containing transcription regulator protein 1 (WWTR1). The rest of the top 10 hub genes were also involved in several neuron-related pathways, including axon development (SMAD4, SLC32A1, and YAP1), neuron death (SNAP25 and ATP5B), and synapse organization (SMAD4 and SLC32A1). The results are shown in Supplementary Figure 4. BDNF and WWTR1 were identified as key genes by overlapping the 18 hub genes in the PPI network and 35 potential predictors selected from the LASSO regression model. To confirm the result, we enrolled another independent dataset (GSE106241) to conduct validation analysis (Supplementary Table 5). As shown in Figures 8C,D, BDNF and WWTR1 showed significant differences among the different Braak stages (p = 0.001 and 0.041, respectively). Moreover, we found that BDNF was negatively associated with Aβ 42 levels and β-secretase activity (R = -0.35 and R = -0.42, respectively, Figures 8E,F). WWTR1 was positively associated with Aβ 42 levels and β-secretase activity (R = 0.36 and R = 0.61, respectively, Figures 8G,H). This result further proves that the two genes are essential and involved in the pathology of AD.

DISCUSSION
Bioinformatics analysis has developed rapidly and applied to many diseases in recent decades, revealing the complex pathogenesis and identifying new biomarkers for diagnosis and treatment . Nevertheless, integrated bioinformatics analysis has not yet been systematically used in AD. Previous researches were usually based on single datasets and few samples that may weaken the credibility of the results. However, our current study has recruited five open public datasets for DEGs to significantly improve the number of samples (495 NDHCS vs. 656 AD samples). Thus, our study provides more credible and trustworthy results. We performed a series of integrative analyses based on DEGs, including GO and KEGG enrichment analyses, a constructed PPI network, LASSO logistic regression, and GSEA. In this way, we provide valuable clues for investigating the molecular mechanisms underlying the initiation and development of AD.
The KEGG pathway enrichment analysis showed significant enrichment in pathways including the Hippo signaling pathway, the TGF-beta signaling pathway, the MAPK signaling pathway, the synaptic vesicle cycle, lysosome, and the cholinergic synapse. A recent study confirmed that the Hippo pathway is associated with the pathogenesis of AD. The precursor of Aβ can promote the nuclear translocation of FOXO 3a by inducing MST1-dependent phosphorylation of Foxo 3a. The MST-Foxo pathway, which is considered a branch of the Hippo pathway, activates a proapoptotic member of the Bcl-2 family and triggers an intrinsic apoptotic pathway, resulting in neuronal death (Wang and Wang, 2016). Regarding the TGF-beta signaling pathway, previous studies showed that the expression of TGF-β 1 and TGF-β 2 increased in the brains of patients with AD (Zetterberg et al., 2004). Given the current evidence of microglial dysfunction in neurodegeneration, we speculate that changes in brain TGF-β signaling in AD could alter microglial state and trigger their pathogenic functions (Salter and Stevens, 2017). It is well documented that lysosomal dysfunction is a prominent feature in AD brains, resulting in a failure to clear accumulated protein aggregates and contributes to the process of the pathogenesis of AD (Fraldi et al., 2016). Similarly, the MAPK signal pathways are activated in vulnerable brain regions of AD patients and are involved in the progress of AD (Zhu et al., 2002;Guillot et al., 2016). Therefore, the MAPKs have been proposed as therapeutic targets for AD. Previous studies also confirm that cholinergic transmission impairments are correlated with the neuropathological stage of AD (Amberla et al., 1993). A decrease in the cholinergic activity and disruption of synaptic function contribute to memory impairment (Sayer et al., 2004). Furthermore, the GO analysis indicated that the DEGs are involved in a wide range of BPs and have different MFs. We also found several specific dysregulated pathways in each brain region. Our results indicated that the changes in BPs, CCs, MFs, and pathways might play critically important roles in the pathogenesis of AD.
We identified 18 hub genes by overlapping five sequencing methods in cytoHubba, of which 16 hub genes were significantly dysregulated as AD progressed. Some of these genes have been previously reported to be associated with AD. For example, the reduction of SNAP25 causes postsynaptic loss and learning and memory impairment (Ren et al., 2018). A recent study has demonstrated that SNAP25 is an effective biomarker for predicting AD 5-7 years before cognitive impairment . SYP can affect the synaptic structure and neurotransmitter release to regulate synaptic plasticity (Zhu et al., 2019). The dysfunction of ATP5B is associated with neurofibrillary tangle burden in the AD brain and with cognition (Wang et al., 2017).
The decrease of BDNF correlates with the neuropathological stage of AD (Laske et al., 2006). Further studies are required to investigate their features, functions, and mechanisms.
In the present study, we constructed a 35 gene-based LASSO model, which can accurately predict AD in both validation and combined brain tissue datasets. Among the 35 genes, previous studies have reported that the expressions of IFITM3 (Correani et al., 2017), SORBS1 (Blalock et al., 2004), ENAH (de Oliveira-Júnior et al., 2015), SST (Solarski et al., 2018), ENSA (Boettcher et al., 2008), C14orf40 (Chung et al., 2018), BASP1 (Zhou et al., 2020), RTN3 (Zou et al., 2018), CDK7 (Zhu et al., 2000), and HPCA (Jiang et al., 2016) were associated with AD or have functions in neural tissue, indicating possible therapeutic targets. For example, IFITM3 is a reliable biomarker of the inflammatory microglial phenotype in AD damaged tissues (Correani et al., 2017), and the expression of SORBS1 is higher, while the expression of ENSA is lower in the brain of patients with AD (Blalock et al., 2004;Boettcher et al., 2008). SST interferes with Aβ fibrillization and promotes the formation of Aβ assemblies (Solarski et al., 2018). However, the molecular mechanism of these 35 genes contributing to AD pathogenesis is still poorly understood, and further exploration of potential mechanisms may be valuable.
To assess and confirm its clinical application value, we validated our gene signature on peripheral blood. According to ROC curves, the 35-gene-based model is able to distinguish MCI and AD samples from NDHCS samples in the blood. Especially in GSE97760, the model shows the perfect ability to select advanced AD patients (AUC = 1.000). Our results indicate that this diagnosis model could be beneficial for clinical applications. It is known that the blood-brain barrier (BBB) controls substance exchange strictly between the brain and blood. However, studies indicate that the breakdown of BBB could enhance the movement of proteins between the brain and blood in either direction (Zipser et al., 2007). Thus, some proteins in the blood might be associated with AD pathology. Recent studies have also adopted the strategy to integrate brain and blood datasets to identify potential AD biomarkers (Yao et al., 2018;Zhu et al., 2020). Our results shed new light on diagnosis biomarker identification. Further large-sample studies with a different analysis in the blood are required to confirm our results.
This study detected two key genes, BDNF and WWTR1, as potential biomarkers for clinical diagnosis and therapeutic monitoring in AD. Both of them have been identified as hub genes in the PPI network and varied significantly in the different neuropathological stages of AD. They were also selected as potential diagnostic biomarkers from LASSO logistic regression. GSEA suggests that the BDNF and WWTR1 could function as key players in a broad array of essential signaling pathways. Moreover, an independent validation showed that BDNF and WWTR1 are associated with the Braak stage, Aβ 42 levels, and β-secretase activity.
BDNF, located on chromosome 11 p 14, is reported to play an essential role in regulating neurodevelopment, promoting neuronal survival, and supporting basal forebrain cholinergic projections to the HPC and neocortex (Marmigère et al., 2003;Huang et al., 2007). Previous studies indicate that BDNF depletion led to an increase in cortical amyloid plaque numbers and size (Braun et al., 2017). It has also been reported that the expression of BDNF decreased in the brain tissue of patients with AD (Connor and Dragunow, 1998;Fields et al., 2014). This is consistent with our results that BDNF is negatively correlated with the Braak stage, and GSEA suggested that it is involved in HPC development, cognition, neuron death, and neurotransmitter regulation. Thus, the downregulation of BDNF may play a crucial role in the pathogenesis of AD. WWTR1 is another potential diagnosis biomarker and may contribute to the development of AD. Functional enrichment analysis showed that WWTR1 was significantly involved in the neuroinflammatory response and neurotransmitter biosynthetic and metabolic pathways. Previous research indicates that WWTR1 is playing a crucial role in the Hippo and TGF-beta pathways, which is associated with the progress of AD (Lei et al., 2008;Deiana et al., 2018). However, the mechanisms of WWTR1 and AD remain undefined. More research is needed to elucidate the functions and underlying mechanisms of WWTR1 and AD. Although we enrolled an additional dataset for external validation, high-quality validation experiments are still required to prove the value of BDNF and WWTR1 in AD pathology.

CONCLUSION
In conclusion, we identified 608 consensus DEGs, several dysregulated pathways, and 16 hub genes associated with AD progress by a series of bioinformatics analyses. The diagnostic model of 35 genes was constructed, which has a high AUC value in not only brain tissue but also peripheral blood. BDNF and WWTR1 were identified as candidate genes for future molecular studies. Our current study deepens our understanding of underlying molecular mechanisms in AD and provides new potential diagnostic and therapeutic biomarkers.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
WuY and YL designed the study. WuY, WeY, and YY performed the statistical analysis and wrote the manuscript. All authors reviewed the manuscript and gave the final approval for publication.