Artificial intelligence and multi-omics nominate TAZ as an insomnia-related diagnostic and druggable target for Parkinson’s disease patients

Ma, Wenjing

doi:10.3389/fnagi.2026.1727472

ORIGINAL RESEARCH article

Front. Aging Neurosci., 04 February 2026

Sec. Parkinson’s Disease and Aging-related Movement Disorders

Volume 18 - 2026 | https://doi.org/10.3389/fnagi.2026.1727472

Artificial intelligence and multi-omics nominate TAZ as an insomnia-related diagnostic and druggable target for Parkinson’s disease patients

Wenjing Ma ^*

Beijing Chaoyang Gaobeidian Community Health Service Center, Beijing, China

Article metrics

View details

546

Views

Downloads

Abstract

Background:

Insomnia is one of the most common non-motor comorbidities of Parkinson’s disease (PD) and often before the onset of motor symptoms. Identifying the molecular mechanisms of insomnia may facilitate the early diagnosis of PD and contribute to therapeutic development.

Methods:

Five human PD substantia nigra (SN) bulk-seq datasets (GSE20141, GSE7621, GSE20164, GSE20163, and GSE20333), with an insomnia-related gene list, were acquired from GEO and Genecard databases. First, the integration of GSE20141 and GSE7621 was analyzed to identify insomnia-related DEGs using limma and the WGCNA framework. GSE20164 and GSE20163 combination were used as a training set for insomnia-related hub gene recognition. Furthermore, the aforementioned four datasets, along with an independent validation set (GSE20333), were cross-validated for insomnia-related diagnostic model construction. The human PD-SN single-cell profile (GSE140231) was utilized for exploring the mechanisms underlying the heterogeneity of insomnia-related hub genes in spatial and temporal contexts. Furthermore, a cutting-edge artificial intelligence (AI)-driven framework (DrugRefLector) and molecular docking techniques was used to identify an optimal agent for the treatment of PD based on the GSE20164 and GSE20163 integrated dataset. Finally, an in vitro q-RT-PCR experiment was conducted to estimate the targeted gene expression.

Results:

TAZ (WWTR1) is associated with the increased expression of insomnia-related diagnostic markers linked to PD pathogenesis, mainly in neurons, and has excellent predictive performance for PD diagnosis. Furthermore, BRD-K97481123 can be considered as a potential therapeutic agent for the treatment of PD by targeting TAZ.

Conclusion:

By integrating AI pipelines and multi-omics, our study first traced TAZ mechanisms in PD pathogenesis and elaborated on TAZ’s predictive and druggable potential for PD patients.

1 Introduction

Parkinson’s disease (PD) is a common, heterogeneous neurodegenerative disorder with a rising global prevalence (Tolosa et al., 2021). PD is characterized by the progressive degeneration of dopaminergic neurons in the substantia nigra (SN) and the formation of pathological α-synuclein aggregates, known as Lewy bodies (Hayes, 2019). The clinical features of PD encompass various motor symptoms, such as tremors and rigidity, and non-motor symptoms, such as olfactory impairment, depressive complications, and insomnia (Kalia and Lang, 2015). The initiation of PD arises from a multifaceted interaction of elements, which include genetic predisposition and environmental factors (Bloem et al., 2021). Although advancements have been made in elucidating the pathophysiology of PD, the processes of clinical diagnosis and therapeutic intervention continue to be inadequate, encountering considerable obstacles such as the heterogeneity of the disease (Sveinbjornsdottir, 2016). Insomnia is one of several common non-motor symptoms of PD, affecting a majority of patients and often co-existing with circadian dysregulation and excessive daytime sleepiness (Henderson, 2022). Insomnia is also considered a common comorbidity among PD patients, which is caused by degeneration of the neural structures that modulate sleep (Iranzo et al., 2024). At the molecular level, classical PD phenotypes, such as neuroinflammation, mitochondrial stress, and impaired proteostasis, are central to PD pathogenesis and can be modulated by sleep and circadian homeostasis (Iranzo et al., 2024). Hence, deciphering the cellular and molecular level of insomnia in PD may facilitate the development of novel therapeutic strategies aimed at improving insomnia and increasing the quality of life for PD patients.

In this study, we used artificial intelligence (AI) with integrative bioinformatics pipelines and multi-omics to discover molecular mechanisms and therapeutic strategies for the treatment of PD. First, the internal public bulk-seq dataset (integrated GSE20141 and GSE7621 bulk profile, including 17 Control and 26 PD SN samples) was analyzed for the identification of co-expression differentially expressed gene (DEG) using limma and WGCNA analysis. The insomnia-related gene list was intersected with co-expression DEGs to render insomnia-related DEGs (IDEGs). Based on IDEGs, the trained public bulk profile (integrated GSE20163 and GSE20164, including 15 control and 14 PD SN samples) was utilized to identify insomnia-associated hub genes using three machine learning algorithms (random forest [RF], least absolute shrinkage and selection operator [LASSO], and support vector machine [SVM]). The results indicated that TAZ can be considered as a hub gene involved in PD pathogenesis. An independent validation set (GSE20333, including six control and six PD SN samples) along with internal and training sets were utilized for TAZ PD diagnostic performance evaluation using ROC, PR, DCA, nomogram, and calibration analyses. The results indicated that TAZ could be considered a favorable diagnostic biomarker for PD onset. Furthermore, single-cell analysis revealed that TAZ is mainly distributed in neurons and associated with neuronal differentiation and PD pathogenesis. Furthermore, an AI-driven therapeutic screening framework identified BRD-K98481123 as a potential therapeutic agent for PD treatment by targeting TAZ based on the PD training bulk profile. Finally, in vitro studies demonstrated that TAZ was up-regulated. Our study is the first to report the diagnostic role of insomnia-related TAZ in PD and offers novel clinical therapeutic strategies. We describe the workflow of this study in Figure 1.

Figure 1

Flowchart outlining a research process. It starts with combining GSE20141 and GSE7621 datasets for Limma and WGCNA analyses. Key variables lead to insomnia DEGs identification with intersection from the Genecard database. Machine learning and immune feature analysis follow for hub gene identification. Next steps include drug prediction and diagnostic performance estimation. The process concludes with TAZ performance and experimental validation, featuring various analyses like pseudotime, cell talk, and spatial distribution. — Workflow of this study.

2 Materials and methods

2.1 Source of data

We first downloaded five SN microarray datasets of PD patients and corresponding clinical information from the Gene Expression Omnibus (GEO) database using the GEOquery package of R software, including GSE20141, GSE20163, GSE20164, and GSE20333 (Davis and Meltzer, 2007). We integrated GSE20141 and GSE7621 (based on GPL570) using the sva package of R software for removing a batch effect, and considered this integration as an internal set (Leek et al., 2012). Next, GSE20163 and GSE20164 (based on GPL96) were considered as the training set and subjected to integration following the same rules. Furthermore, GSE20333 (based on GPL201) was considered as an independent validation set. All these datasets were normalized and standardized using the limma package of R (Ritchie et al., 2015). The insomnia-related gene list was acquired from the Genecard database with a threshold > 1(Xu et al., 2023).

2.2 Identification of DEGs and WGCNA analysis

Differential expression analysis was performed on the internal set using the limma package or R software (Ritchie et al., 2015). DEGs were identified using thresholds of | log₂FC | > 1 and p < 0.05 and visualized with a volcano map and a heatmap using ggplot2 and complexheatmap packages (Gustavsson et al., 2022; Gu, 2022). Furthermore, we utilized the WGCNA package in R software to investigate the association between genes and phenotypes by constructing a gene co-expression network in the internal set (Langfelder and Horvath, 2008). Initially, we excluded 50% of the genes with the lowest median absolute deviation (MAD) (Langfelder and Horvath, 2008). Following this, we calculated Pearson’s correlation matrices for all possible gene pair comparisons and constructed a weighted adjacency matrix by applying the average linkage method in conjunction with a weighted correlation coefficient (Langfelder and Horvath, 2008). The “soft” thresholding power (β) was subsequently utilized to ascertain the adjacency, which was then converted into a topological overlap matrix (TOM) (Langfelder and Horvath, 2008). To group genes with similar expression patterns into modules, we performed average linkage hierarchical clustering based on the dissimilarity metric derived from the TOM, ensuring a minimum group size of 50 genes (Langfelder and Horvath, 2008). Ultimately, we evaluated the dissimilarity of module eigengenes, established a cutoff for the module dendrogram, and merged several modules. The WGCNA was used to identify significant modules associated with PD, resulting in the development of a visualized eigengene network. In addition, the insomnia-related gene list was intersected with DEGs acquired from limma and the co-expression module acquired from WGCNA for the identification of IDEGs, which was visualized by a Venn plot generated by R software (Jia et al., 2021). Furthermore, the KEGG and GO functional enrichment analyses of IDEGs was performed using the clusterProfiler in R with a threshold FDR < 0.05 in accordance with the hallmark gene set downloaded from the MSIGDB database (Yu et al., 2012; Liberzon et al., 2015).

2.3 Machine learning algorithms and diagnostic model construction

LASSO logistic regression analysis represents a sophisticated data mining technique that utilizes an L1 penalty (lambda) to effectively minimize the coefficients of less critical variables to zero (Kang et al., 2021). This approach enables the identification of significant variables, facilitating the development of an optimal classification model. The SVM-RFE approach is a supervised machine learning methodology used to ascertain the most critical core genes by systematically eliminating feature vectors produced by the Support Vector Machine (Guan et al., 2024). Random forest (RF) analysis is a decision tree-based machine learning method that focuses on evaluating the significance of variables by scoring the importance of each variable (Wallace et al., 2023). In combination with these three machine learning algorithms with a training set, we acquired the insomnia hub variable involved in PD pathogenesis. Next, the hub variable molecular function was assessed in the training set using the single-gene GSEA analysis in accordance with the hallmark gene set downloaded from the MSIGDB database via the clusterProfiler package of R (Yu et al., 2012). Furthermore, the immune feature of the hub variable was estimated by the CIBERSORT algorithm of R (Chen et al., 2018). Next, the expression and diagnostic value of the hub variable were also cross-validated in internal, training, and independent sets. Diagnostic performance of the hub variable was calculated by ROC, PR, DCA, nomogram, and calibration using pROC, rms, and rmda packages of R software (Robin et al., 2011; Lin et al., 2024; Liu et al., 2024).

2.4 Single-cell transcriptomic analysis

We retrieved the single-cell transcriptomic dataset associated with SN in PD patients, specifically GSE140231, from the GEO database. The analysis of the single-cell RNA sequencing (scRNA-seq) data involved several essential steps, including quality control (QC), dimensionality reduction, and identification of markers, all of which were performed using the Seurat R package (Butler et al., 2018). A strict quality control process was implemented for each cell, adhering to predefined criteria that stipulated gene counts should range from 200 to 6,000, the count of unique molecular identifiers (UMIs) should surpass 1,000, and the percentage of mitochondrial genes should remain below 10% (Butler et al., 2018). Upon completion of these QC procedures, the data were normalized, enabling the identification of 2000 genes that demonstrated significant variability for further analysis (Butler et al., 2018). Following normalization, dimensionality reduction methods, particularly t-SNE and UMAP, were applied. Cell type annotations were conducted utilizing the scMayoMap algorithm implemented in the R software (Yang et al., 2023). We evaluated the expression levels of the target genes across the various annotated cell populations. Intercellular communication networks were inferred through the use of the CellTalker package in R (Barut et al., 2022). Furthermore, we investigated energy metabolic pathways at the single-cell level among the annotated cell populations by using the scMetabolism package in R (Argüello et al., 2020). Importantly, a pseudo-time analysis of the expression of targeted genes within specific cell types was performed in both temporal and spatial contexts using the monocle2 package in R (Fang et al., 2022). ScTenifoldKnk was performed for the identification of Knockout (KO) of the hub gene in the targeted cell (Osorio et al., 2022).

2.5 AI-driven drug prediction and molecular docking

The DrugRefLector framework, which uses active learning to utilize transcriptomic data, was utilized to identify modulators associated with disease phenotypes (DeMeo et al., 2025). Utilizing the integrated GSE20164 and GSE20163 bulk profiles, we implemented DrugRefLector to discover optimal therapeutic agents aimed at alleviating PD (DeMeo et al., 2025). To evaluate the binding affinity of the optimal therapeutic agents to the hub gene, we conducted molecular docking studies (Wang et al., 2022). This molecular docking was crucial for examining the interactions between the selected drugs and their corresponding proteins (Wang et al., 2022). The Protein Data Bank (PDB) files for the target proteins (PDB ID: 5 N75) were obtained from the RCSB PDB repository, while ligand SDF files (Pubchem ID: 44620789) were sourced from the PubChem database (Berman et al., 2000; Kim, 2016). Following this, we executed molecular docking to quantify the binding affinities between the target proteins and the compounds. Initially, PyMOL software (Version 2.6.0) was used to remove water molecules and ligands, retaining the protein backbone (Ji et al., 2023). Subsequently, the AutoDock Vina Tool (Version 4.2.6) was utilized to identify potential binding sites on the protein surface and to perform flexible molecular docking (Ji et al., 2023). This process entailed calculating docking scores and binding affinities (expressed as Vina scores in kcal/mol) for each identified binding site (Ji et al., 2023). We ranked the top five binding sites based on the calculated binding energy, ultimately selecting the site with the lowest energy for visualization in PyMOL. This visualization enabled us to identify the locations of hydrogen bonds associated with ligand binding in the resulting images (Ji et al., 2023). The outcomes were subsequently illustrated in PyMOL to demonstrate binding modes and hydrogen bonding interactions (Ji et al., 2023).

2.6 Cell lines and culture conditions

Authenticated human dopaminergic (DA) neuron SH-SY5Y cells were obtained from the Shanghai Institute of Cell Biology (Shanghai, China). These cells were cultured in Dulbecco’s modified Eagle medium (DMEM), supplemented with 10% fetal bovine serum (FBS) and 1% penicillin–streptomycin. The cultures were maintained at 37 °C in a humidified incubator containing 5% CO₂ (Quan et al., 2024). The medium was refreshed every 2–3 days, and the cells were passaged when they reached approximately 80% confluence (Quan et al., 2024). To simulate neuronal injury, SY5Y cells were exposed to ultrapure MPP + (200 mM; #D048, Sigma-Aldrich, St. Louis, MO, USA) for a duration of 24 h at 37 °C. MPP⁺ SY5Y cells were cultured for simulated PD, and SY5Y cells were cultured as a normal control (Quan et al., 2024).

2.7 Quantitative real-time PCR (qRT-PCR)

Total RNA was isolated using the TRIzol reagent (TaKaRa, Beijing, China), and the subsequent analysis of its concentration, purity, and integrity was performed utilizing a NanoDrop spectrophotometer (Thermo Scientific, Waltham, MA, USA) (Zhou et al., 2024). Reverse transcription was executed with 1 μg of total RNA using HiScript II Q RT SuperMix for qPCR (+gDNA wiper) alongside a gDNA eraser (Vazyme, Shanghai, China) (Zhou et al., 2024). The concentration, purity, and integrity of the resultant cDNA were also assessed using a NanoDrop spectrophotometer (Thermo Scientific, Waltham, MA, USA) (Zhou et al., 2024). Quantitative reverse transcription PCR (qRT-PCR) was carried out with SYBR Green MasterMix (11203ES50, YEASEN, Shanghai, China) and analyzed through StepOne Software v.2.3 (Applied Biosystems, Carlsbad, CA, USA), incorporating 40 amplification cycles (three biological replicates) (Zhou et al., 2024). Data analysis was performed using the ∆∆Ct (cycle threshold) method, with normalization to the expression levels of the reference gene, GAPDH. The primer sequences for the target gene were as follows:

TAZ:

Forward: 5′-GACCCCAGACATGAGATCCA-3′.

Reverse: 5′-CCTGCGTTTTCTCCTGTATCC-3′.

GAPDH:

Forward, 5′-GAGAAGGCTGGGGCTCATTT-3′.

Reverse, 5′-ATGACGAACATGGGGGCATC-3′.

2.8 Statistical analysis

All statistical analyses were performed in R software (Version 4.2.2) and GraphPad Prism (version 9.0). Differences between the two groups were assessed using Student’s t-test or the Wilcoxon rank-sum test, depending on data distribution. Comparisons among multiple groups were conducted using the Student t-test and one-way ANOVA, followed by Tukey’s post-hoc test. Correlations between gene expression and immune cell infiltration were evaluated using Spearman’s correlation analysis. A two-tailed p-value of < 0.05 was considered statistically significant.

3 Results

3.1 Identification of proposed variables in PD patients using limma and WGCNA analysis

To identify candidate genes associated with Parkinson’s disease (PD), we first performed quality control and normalization of the internal set. Principal component analysis (PCA) revealed a clear separation trend between PD patients and healthy controls (Supplementary Figure S1A). Boxplots after normalization confirmed consistent expression distributions across samples (Supplementary Figure S1B). A total of 1,053 DEGs were identified, including 850 upregulated and 203 downregulated genes (Figure 2A). Heatmap analysis further illustrated distinct expression profiles between PD patients and controls (Figure 2B). Using weighted gene co-expression network analysis (WGCNA), a soft-threshold power of β = 4 was selected to achieve a scale-free topology (Supplementary Figure S1D). Hierarchical clustering of genes allowed the construction of multiple co-expression modules (Supplementary Figure S1C), and their interrelationships were further visualized through a module clustering heatmap (Supplementary Figure S1E). By correlating co-expression modules, we found that the green-yellow modules were most strongly associated with PD (Figures 2C,D). To enhance robustness, we intersected the DEGs obtained from limma and PD-associated WGCNA modules with the insomnia-related gene list, and identified 5 overlapping genes (Figures 2E,G). Functional enrichment analysis revealed that these intersected genes were significantly enriched in biological processes such as nucleotide metabolic processes, purine nucleotide metabolism, and glycolytic pathways, while molecular function analysis highlighted their roles in ligand-gated ion channel activity and cholinergic receptor activity (Figure 2F).

Figure 2

A series of data visualizations: A) A volcano plot with genes plotted by fold change and p-value, showing 203 down-regulated (pink) and 850 up-regulated (blue) genes. B) A heatmap displaying expression levels with hierarchical clustering, classifying samples into control and PD groups. C) A correlation plot showing group comparisons with varying correlation coefficients and p-values. D) Scatter plots with module memberships and significance values, highlighting data distributions. E) Venn diagram comparing WGCNA and Limma results, indicating shared and unique gene sets. F) A dot plot of enriched gene ontology terms, categorized into biological processes (BP) and molecular functions (MF). G) Another Venn diagram comparing differentially expressed genes (DEGs) and insomnia-related genes. — Identification of insomnia-associated DEGs in PD patients. **(A)** Volcano plot shows DEGs between PD patients and healthy controls in the internal set. **(B)** Heatmap illustrates the expression patterns of distinct DEGs in the internal set. **(C)** Module trait relationship heatmap generated by WGCNA in the internal set. **(D)** Scatterplots of representative modules associated with PD and control in the internal set. **(E)** Venn diagram depicts the overlap between DEGs identified by limma and the co-expression gene module derived from WGCNA. **(F)** KEGG and GO enrichment analysis of insomnia-related DEGs. **(G)** Venn diagram shows the intersection between targeted DEGs and the insomnia gene list.

3.2 Insomnia-related diagnostic signature and hub gene identification for PD patients using machine learning

To construct an insomnia-related hub gene linked with PD, we applied LASSO, RF, and SVM-RFE for hub variable identification (Figures 3A–C). Integrative analysis of the 3 methods revealed 1 overlapping hub gene, TAZ (Figure 3D). To explore the molecular role of TAZ, we performed single-gene GSEA. The results demonstrated that TAZ was significantly associated with signaling pathways, including complement and immune response, WNT/β-catenin signaling, and mitotic spindle regulation (Figure 3E). Furthermore, CIBERSORT immune cell infiltration analysis indicated that TAZ expression was negatively correlated with the dendritic cell resting proportion (Figure 3F).

Figure 3

A composite image with multiple panels depicting various data visualizations. Panel A shows a Lasso regression plot with coefficients against Log(λ) and a misclassification error plot. Panel B features scatter plots illustrating the mean decrease in Gini and accuracy for various genes. Panel C presents line graphs showing ten-fold cross-validation accuracy and error against the number of features. Panel D is a Venn diagram comparing Lasso, RF, and SVM methods with overlapping areas. Panel E displays an enrichment plot with pathways such as COMPLIMENT and WNT_BETA_CATENIN_SIGNALING. Panel F includes a bar graph showing immune cell fractions with TAZ levels, with red bars for high TAZ and blue for low TAZ. — Identification of an insomnia-related hub gene for PD using machine learning approaches. (A) LASSO regression analysis for feature selection among candidate DEGs. **(B)** RF analysis ranks the importance of candidate genes. **(C)** SVM-RFE curves show cross-validation accuracy and error rate across different feature subsets. **(D)** Venn diagram illustrates the intersection of candidate genes identified by LASSO, RF, and SVM. **(E)** Single-gene GSEA analysis of TAZ. **(F)** CIBERSORT-based immune cell infiltration analysis of TAZ.

3.3 Cross-validation of insomnia-related diagnostic model performance in PD patients

We next validated the diagnostic performance of TAZ for PD. Expression analysis revealed significantly higher TAZ expression levels in PD patients compared to healthy controls across all datasets (Figures 4A–C). To assess diagnostic efficacy and accuracy of TAZ, ROC, PR DCA, nomogram, and calibration were utilized across internal, training, and independent validation sets. Collectively, the findings illustrate that TAZ serves as a robust biomarker with favorable diagnostic performance for PD (Figures 4A–C).

Figure 4

Panel A displays data for the internal set with a violin plot comparing control and PD groups, ROC curve with AUC of 0.821, precision-recall curve, and net benefit graph. The calibration plot compares predicted versus actual probabilities. Panel B shows similar analyses for the training set with an AUC of 0.827. Panel C displays the independent validation set, with violin plot and AUC of 0.778, along with corresponding performance evaluation plots. — Cross-validation of the diagnostic performance of TAZ in PD. **(A)** Expression and diagnostic value evaluation of TAZ in the internal set. **(B)** Expression and diagnostic value evaluation of TAZ in the training set. **(C)** Expression and diagnostic value evaluation of the independent validation set.

3.4 Landscape of TAZ at the single-cell level in PD patients

To further elucidate the heterogeneity of SN and trace the temporal and spatial mechanisms of TAZ for PD patients, we performed single-cell RNA sequencing analysis in GSE140231 (including seven SN samples from PD patients). Rigorous QC confirmed stable sequencing depth, gene counts, and mitochondrial content across samples (Supplementary Figures S2A,B). Heatmap visualization of marker gene expression enabled accurate cell-type annotation, ultimately identifying 19 main cell clusters (Supplementary Figures S2D,E). For annotation, both UMAP and t-SNE plots illustrated distinct 10 cell types (Figure 5A). Significantly, oligodendrocytes and interneurons shared the largest proportion, indicating the persistent neuroinflammatory and neurodegenerative signaling in PD patients (Ma et al., 2025; Panicker et al., 2022) (Figure 5B). Next, cell chat manners among these 10 cell types and the corresponding ligand-receptor were analyzed (Figure 5C). In addition, we also discovered metabolic heterogeneity between these 10 cell types (Figure 5D). TAZ is mainly distributed in neurons and is involved in the neuronal cell cycle (Figure 5F). Pseudotime trajectory analysis suggested that TAZ dynamically regulates neuronal differentiation, with expression peaking during early-to-mid developmental stages and declining in terminally differentiated neurons (Figures 5E,G). In virtual neurons, we performed KO of TAZ and illustrated the Top10 DEGs after TAZ (Figure 5H). Indeed, these 10 DEGs were mainly involved in various pathways and functions closely related to PD pathogenesis, indicating that TAZ a crucial role in PD progression (Figure 5I).

Figure 5

Scientific visualization consisting of multiple panels. Panel A displays two scatter plots with cell type clustering using UMAP and t-SNE projections. Panel B shows a stacked bar chart representing cell type distribution across different categories. Panel C is a circular diagram illustrating gene interactions. Panel D presents a dot plot of gene expression values across cell types. Panel E comprises pseudotime trajectory plots with branching paths. Panel F is a lollipop chart of average gene expression with varying expression percentages. Panel G shows a line graph of TAZ gene expression over pseudotime. Panel H is a bar chart of the top differentially regulated genes after TAZ knockout. Panel I includes dot plots for gene enrichment analysis, displaying gene ratios and FDR values. — Global single-cell analysis of TAZ in PD patients. **(A,B)** UMAP and t-SNE plots display 10 annotated cell types and corresponding cell proportions across 7 samples. **(C)** Cell–cell communication network. **(D)** Metabolic heterogeneity among these 10 cell types. **(E)** Differentiation patterns of neurons. **(F)** Distribution of TAZ among these 10 cell types. **(G)** Pseudotime trajectory analysis of TAZ in neurons. **(H,I)** AI virtual KO of TAZ in neurons.

3.5 In vitro examination of TAZ expression in SY5Y cells and identification of potential therapeutic agents

To experimentally validate the relevance of TAZ in Parkinson’s disease (PD), we measured its expression in MPP⁺ SY5Y cells and SY5Y cells. The results showed that TAZ mRNA expression was significantly upregulated in PD cells compared to controls (Figure 6A). To further investigate potential therapeutic agents targeting TAZ, we utilized an AI-driven therapeutic screening framework (DrugRefLector) in an integrated GSE20164 and GSE20163 bulk profile (training set)(Figure 6B). Results have shown that 10 agents can potentially reverse PD, and BRD-K98481123 was the optimal one (Figure 6B). To assess whether TAZ can be considered as a target for BRD-K98481123, we performed molecular docking validation (Figure 6C). Results indicated that BRD-K98481123 can bind to TAZ C1 cavity pocket with favorable binding affinity (−9.7 kcal/mol). These results indicated that TAZ plays a pathogenic role in PD progression and BRD-K98481123 can be considered as a potential therapeutic agent for PD treatment by targeting TAZ.

Figure 6

Panel A shows a bar graph comparing relative TAZ mRNA levels between Control and PD groups, with PD significantly higher. Panel B displays a scatter plot depicting drug activity, with specific compounds labeled, showing high probability for some drugs in PD to HC conversion. Panel C features a molecular structure with highlighted regions in magenta, indicating protein components, and yellow, denoting specific sites. Insets show a close-up of the active site with labeled amino acids, a table of Vina scores and cavity volumes, and a chemical structure. — Experimental validation of TAZ expression and identification of potential therapeutic agents for PD treatment. **(A)** q-RT-PCR analysis of TAZ expression levels between MPP⁺ SY5Y cells and SY5Y cells. **(B)** Candidate drug screening results from DrugRefLector prediction. **(C)** Molecular docking between BRD-K98481123 and TAZ.

4 Discussion and conclusion

In this study, we systematically integrated bulk and single-cell transcriptomics with an artificial intelligence (AI) framework for insomnia-related predictive and therapeutic model construction for PD patients. Our findings identified that TAZ can be considered an upregulated diagnostic and druggable target for PD patients. Single-cell analysis revealed that TAZ was mainly distributed in neurons and involved in biological functions and pathways related to the PD pathogenesis.

TAZ (WWTR1), a Hippo pathway effector, was mainly located in the cytosol and the nuclear body. For neuro-regulation, TAZ exhibits distinct roles in modulating the differentiation of astrocytes (Chen et al., 2024). Specifically, TAZ plays a crucial role in overseeing the process of differentiation and maturation as these progenitors transition into fully developed astrocytes (Chen et al., 2024). Furthermore, overexpression of TAZ also contributes to the progression of Glioblastoma using activation of cariogenic signals (Pontes and Mendes, 2023). Furthermore, reports have been verified that modulation of Hippo signals can alleviate the motor and non-motor symptoms and restore cognition for PD patients (Choi et al., 2024). Indeed, insomnia, a major non-motor complication for PD patients, significantly affects PD patients’ quality of life (Henderson, 2022). Previous investigations indicated that imbibition of the hippo signaling modulator can reduce neuroinflammation and improve insomnia in an animal model (Choi et al., 2024). However, the definite role of TAZ in PD pathogenesis and the corresponding mechanisms of TAZ in regulating insomnia have not yet been elucidated.

Overall, in this study, our integrative approach highlights the mechanistic links between insomnia and PD progression at the molecular level using AI pipelines and multi-omics. We also elucidated insomnia and Hippo-related TAZ as an up-regulated diagnostic biomarker and potential therapeutic target, with corresponding molecular and immune features involved in PD progression. Indeed, BRD-K98481123 can be considered as a potential therapeutic agent targeting TAZ for the treatment of PD. However, there are several limitations to our study. First, the TAZ mechanisms of TAZ in PD pathogenesis acquired in silico should be verified in a pre-clinical study. For example, the prediction of scTenifoldKnk may favor regulatory rather than structural genes, as the latter tend to have a smaller degree in the network (Osorio et al., 2022). Hence, in silico KO of TAZ accuracy should be validated in real-world experiments to enhance the robustness of the results. Second, the therapeutic efficacy of BRD-K98481123 targeting PD should be assessed by pre-clinical and clinical studies. Furthermore, the diagnostic performance of TAZ targeting PD should be verified in multi-center studies. Future research should be focused on addressing the mechanisms of TAZ involved in PD pathogenesis and its therapeutic potential.

Statements

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://www.ncbi.nlm.nih.gov/geo/, GSE20141; https://www.ncbi.nlm.nih.gov/geo/, GSE7621; https://www.ncbi.nlm.nih.gov/geo/, GSE20164; https://www.ncbi.nlm.nih.gov/geo/, GSE20163; https://www.ncbi.nlm.nih.gov/geo/, GSE20333; https://www.ncbi.nlm.nih.gov/geo/, GSE140231.

Ethics statement

Ethical approval was not required for the studies involving humans because the study involves the secondary analysis of existing, anonymized data from the publicly available Gene Expression Omnibus (GEO) database and experiments on commercially available cell lines. Therefore, ethical approval was not required. The studies were conducted in accordance with the local legislation and institutional requirements. The human samples used in this study were acquired from the human tissue data analyzed in this study were obtained from the publicly available Gene Expression Omnibus (GEO) database. The human cell line (SH-SY5Y) used for experimental validation was obtained from the Shanghai Institute of Cell Biology. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements. Ethical approval was not required for the studies on animals in accordance with the local legislation and institutional requirements because only commercially available established cell lines were used.

Author contributions

WM: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declared that financial support was not received for this work and/or its publication.

Conflict of interest

The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that Generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnagi.2026.1727472/full#supplementary-material

References

1
Argüello R. J. Combes A. J. Char R. Gigan J. P. Baaziz A. I. Bousiquot E. et al . (2020). SCENITH: a flow cytometry-based method to functionally profile energy metabolism with single-cell resolution. Cell Metab.32, 1063–1075.e7. doi: 10.1016/j.cmet.2020.11.007,
2
Barut G. T. Kreuzer M. Bruggmann R. Summerfield A. Talker S. C. (2022). Single-cell transcriptomics reveals striking heterogeneity and functional organization of dendritic and monocytic cells in the bovine mesenteric lymph node. Front. Immunol.13:1099357. doi: 10.3389/fimmu.2022.1099357
- CrossRef
- Google Scholar
3
Berman H. M. Westbrook J. Feng Z. Gilliland G. Bhat T. N. Weissig H. et al . (2000). The Protein Data Bank. Nucleic Acids Res.28, 235–242. doi: 10.1093/nar/28.1.235,
4
Bloem B. R. Okun M. S. Klein C. (2021). Parkinson's disease. Lancet397, 2284–2303. doi: 10.1016/S0140-6736(21)00218-X,
5
Butler A. Hoffman P. Smibert P. Papalexi E. Satija R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol.36, 411–420. doi: 10.1038/nbt.4096,
6
Chen B. Khodadoust M. S. Liu C. L. Newman A. M. Alizadeh A. A. (2018). Profiling tumor infiltrating immune cells with CIBERSORT. Methods Mol. Biol.1711, 243–259. doi: 10.1007/978-1-4939-7493-1_12,
7
Chen J. Tsai Y. H. Linden A. K. Kessler J. A. Peng C. Y. (2024). YAP and TAZ differentially regulate postnatal cortical progenitor proliferation and astrocyte differentiation. J. Cell Sci.137, 1–17. doi: 10.1242/jcs.261516
- CrossRef
- Google Scholar
8
Choi J. Park S. W. Lee H. Kim D. H. Kim S. W. (2024). Human nasal inferior turbinate-derived neural stem cells improve the niche of substantia Nigra par compacta in a Parkinson's disease model by modulating hippo signaling. Tissue Eng Regen Med21, 737–748. doi: 10.1007/s13770-024-00635-3,
9
Davis S. Meltzer P. S. (2007). GEOquery: a bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics23, 1846–1847. doi: 10.1093/bioinformatics/btm254,
10
DeMeo B. Nesbitt C. Miller S. A. Burkhardt D. B. Lipchina I. Fu D. et al . (2025). Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes. Science390:eadi8577. doi: 10.1126/science.adi8577
- CrossRef
- Google Scholar
11
Fang Z. Li J. Cao F. Li F. (2022). Integration of scRNA-Seq and bulk RNA-Seq reveals molecular characterization of the immune microenvironment in acute pancreatitis. Biomolecules13, 1–14. doi: 10.3390/biom13010078
- CrossRef
- Google Scholar
12
Gu Z. (2022). Complex heatmap visualization. iMeta1:e43. doi: 10.1002/imt2.43,
13
Guan S. Xu Z. Yang T. Zhang Y. Zheng Y. Chen T. et al . (2024). Identifying potential targets for preventing cancer progression through the PLA2G1B recombinant protein using bioinformatics and machine learning methods. Int. J. Biol. Macromol.276:133918. doi: 10.1016/j.ijbiomac.2024.133918,
14
Gustavsson E. K. Zhang D. Reynolds R. H. Garcia-Ruiz S. Ryten M. (2022). Ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2. Bioinformatics38, 3844–3846. doi: 10.1093/bioinformatics/btac409,
15
Hayes M. T. (2019). Parkinson's disease and parkinsonism. Am. J. Med.132, 802–807. doi: 10.1016/j.amjmed.2019.03.001,
16
Henderson V. W. (2022). Sleep duration, insomnia, and Parkinson disease. Menopause29, 251–252. doi: 10.1097/GME.0000000000001954,
17
Iranzo A. Cochen de Cock V. Fantini M. L. Pérez-Carbonell L. Trotti L. M. (2024). Sleep and sleep disorders in people with Parkinson's disease. Lancet Neurol.23, 925–937. doi: 10.1016/S1474-4422(24)00170-4,
18
Ji L. Song T. Ge C. Wu Q. Ma L. Chen X. et al . (2023). Identification of bioactive compounds and potential mechanisms of scutellariae radix-coptidis rhizoma in the treatment of atherosclerosis by integrating network pharmacology and experimental validation. Biomed. Pharmacother.165:115210. doi: 10.1016/j.biopha.2023.115210
- CrossRef
- Google Scholar
19
Jia A. Xu L. Wang Y. (2021). Venn diagrams in bioinformatics. Brief. Bioinform.22:5. doi: 10.1093/bib/bbab108
- CrossRef
- Google Scholar
20
Kalia L. V. Lang A. E. (2015). Parkinson's disease. Lancet386, 896–912. doi: 10.1016/S0140-6736(14)61393-3,
21
Kang J. Choi Y. J. Kim I. K. Lee H. S. Kim H. Baik S. H. et al . (2021). LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal Cancer. Cancer Res. Treat.53, 773–783. doi: 10.4143/crt.2020.974,
22
Kim S. (2016). Getting the most out of PubChem for virtual screening. Expert Opin. Drug Discov.11, 843–855. doi: 10.1080/17460441.2016.1216967,
23
Langfelder P. Horvath S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9:559. doi: 10.1186/1471-2105-9-559
- CrossRef
- Google Scholar
24
Leek J. T. Johnson W. E. Parker H. S. Jaffe A. E. Storey J. D. (2012). The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics28, 882–883. doi: 10.1093/bioinformatics/bts034,
25
Liberzon A. Birger C. Thorvaldsdóttir H. Ghandi M. Mesirov J. P. Tamayo P. (2015). The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst1, 417–425. doi: 10.1016/j.cels.2015.12.004,
26
Lin G. Gao Z. Wu S. Zheng J. Guo X. Zheng X. et al . (2024). scRNA-seq revealed high stemness epithelial malignant cell clusters and prognostic models of lung adenocarcinoma. Sci. Rep.14:3709. doi: 10.1038/s41598-024-54135-4,
27
Liu C. He Y. Luo J. (2024). Application of chest CT imaging feature model in distinguishing squamous cell carcinoma and adenocarcinoma of the lung. Cancer Manag. Res.16, 547–557. doi: 10.2147/CMAR.S462951,
28
Ma Q. Tian J. L. Lou Y. Guo R. Ma X. R. Wu J. B. et al . (2025). Oligodendrocytes drive neuroinflammation and neurodegeneration in Parkinson's disease via the prosaposin-GPR37-IL-6 axis. Cell Rep.44:115266. doi: 10.1016/j.celrep.2025.115266,
29
Osorio D. Zhong Y. Li G. Xu Q. Yang Y. Tian Y. et al . (2022). scTenifoldKnk: an efficient virtual knockout tool for gene function predictions via single-cell gene regulatory network perturbation. Patterns (N Y)3:100434. doi: 10.1016/j.patter.2022.100434,
30
Panicker N. Kam T. I. Wang H. Neifert S. Chou S. C. Kumar M. et al . (2022). Neuronal NLRP3 is a parkin substrate that drives neurodegeneration in Parkinson's disease. Neuron110, 2422–2437.e9. doi: 10.1016/j.neuron.2022.05.009,
31
Pontes B. Mendes F. A. (2023). Mechanical properties of glioblastoma: perspectives for YAP/TAZ signaling pathway and beyond. Diseases11, 1–14. doi: 10.3390/diseases11020086
- CrossRef
- Google Scholar
32
Quan P. Li X. Si Y. Sun L. Ding F. F. Fan Y. et al . (2024). Single cell analysis reveals the roles and regulatory mechanisms of type-I interferons in Parkinson's disease. Cell Commun. Signal22:212. doi: 10.1186/s12964-024-01590-1,
33
Ritchie M. E. Phipson B. Wu D. Hu Y. Law C. W. Shi W. et al . (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43:e47. doi: 10.1093/nar/gkv007,
34
Robin X. Turck N. Hainard A. Tiberti N. Lisacek F. Sanchez J. C. et al . (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform.12:77. doi: 10.1186/1471-2105-12-77
- CrossRef
- Google Scholar
35
Sveinbjornsdottir S. (2016). The clinical symptoms of Parkinson's disease. J. Neurochem.139, 318–324. doi: 10.1111/jnc.13691
- CrossRef
- Google Scholar
36
Tolosa E. Garrido A. Scholz S. W. Poewe W. (2021). Challenges in the diagnosis of Parkinson's disease. Lancet Neurol.20, 385–397. doi: 10.1016/S1474-4422(21)00030-2,
37
Wallace M. L. Mentch L. Wheeler B. J. Tapia A. L. Richards M. Zhou S. et al . (2023). Use and misuse of random forest variable importance metrics in medicine: demonstrations through incident stroke prediction. BMC Med. Res. Methodol.23:144. doi: 10.1186/s12874-023-01965-x,
38
Wang Y. Yuan Y. Wang W. He Y. Zhong H. Zhou X. et al . (2022). Mechanisms underlying the therapeutic effects of Qingfeiyin in treating acute lung injury based on GEO datasets, network pharmacology and molecular docking. Comput. Biol. Med.145:105454. doi: 10.1016/j.compbiomed.2022.105454
- CrossRef
- Google Scholar
39
Xu M. Zhou H. Hu P. Pan Y. Wang S. Liu L. et al . (2023). Identification and validation of immune and oxidative stress-related diagnostic markers for diabetic nephropathy by WGCNA and machine learning. Front. Immunol.14:1084531. doi: 10.3389/fimmu.2023.1084531
- CrossRef
- Google Scholar
40
Yang L. Ng Y. E. Sun H. Li Y. Chini L. C. S. LeBrasseur N. et al . (2023). Single-cell Mayo map (scMayoMap): an easy-to-use tool for cell type annotation in single-cell RNA-sequencing data analysis. BMC Biol.21:223. doi: 10.1186/s12915-023-01728-6,
41
Yu G. Wang L. G. Han Y. He Q. Y. (2012). clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS16, 284–287. doi: 10.1089/omi.2011.0118,
42
Zhou J. Feng Z. Lv D. Wang D. Sang K. Liu Z. et al . (2024). Unveiling the role of protein kinase C θ in porcine epidemic diarrhea virus replication: insights from genome-wide CRISPR/Cas9 library screening. Int. J. Mol. Sci.25, 1–27. doi: 10.3390/ijms25063096
- CrossRef
- Google Scholar

Summary

Keywords

artificial intelligence, insomnia, multi-omics, Parkinson’s disease, predictive model, TAZ

Citation

Ma W (2026) Artificial intelligence and multi-omics nominate TAZ as an insomnia-related diagnostic and druggable target for Parkinson’s disease patients. Front. Aging Neurosci. 18:1727472. doi: 10.3389/fnagi.2026.1727472

Received

20 October 2025

Revised

03 January 2026

Accepted

04 January 2026

Published

04 February 2026

Volume

18 - 2026

Edited by

Cristian Falup-Pecurariu, Transilvania University of Brașov, Romania

Reviewed by

Stefania Roxana Diaconu, Transilvania University of Brașov, Romania

Swarna Kanchan, Marshall University, United States

Khairiah Razali, International Islamic University Malaysia, Malaysia

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wenjing Ma, mawenjing2468@163.com

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Parkinson’s Disease and Aging-related Movement Disorders

ORIGINAL RESEARCH article

Artificial intelligence and multi-omics nominate TAZ as an insomnia-related diagnostic and druggable target for Parkinson’s disease patients

Abstract

1 Introduction