Identification of a miRNA–mRNA regulatory network for post-stroke depression: a machine-learning approach

Objective The study aimed to explore the miRNA and mRNA biomarkers in post-stroke depression (PSD) and to develop a miRNA–mRNA regulatory network to reveal its potential pathogenesis. Methods The transcriptomic expression profile was obtained from the GEO database using the accession numbers GSE117064 (miRNAs, stroke vs. control) and GSE76826 [mRNAs, late-onset major depressive disorder (MDD) vs. control]. Differentially expressed miRNAs (DE-miRNAs) were identified in blood samples collected from stroke patients vs. control using the Linear Models for Microarray Data (LIMMA) package, while the weighted correlation network analysis (WGCNA) revealed co-expressed gene modules correlated with the subject group. The intersection between DE-miRNAs and miRNAs identified by WGCNA was defined as stroke-related miRNAs, whose target mRNAs were stroke-related genes with the prediction based on three databases (miRDB, miRTarBase, and TargetScan). Using the GSE76826 dataset, the differentially expressed genes (DEGs) were identified. Overlapped DEGs between stroke-related genes and DEGs in late-onset MDD were retrieved, and these were potential mRNA biomarkers in PSD. With the overlapped DEGs, three machine-learning methods were employed to identify gene signatures for PSD, which were established with the intersection of gene sets identified by each algorithm. Based on the gene signatures, the upstream miRNAs were predicted, and a miRNA–mRNA network was constructed. Results Using the GSE117064 dataset, we retrieved a total of 667 DE-miRNAs, which included 420 upregulated and 247 downregulated ones. Meanwhile, WGCNA identified two modules (blue and brown) that were significantly correlated with the subject group. A total of 117 stroke-related miRNAs were identified with the intersection of DE-miRNAs and WGCNA-related ones. Based on the miRNA-mRNA databases, we identified a list of 2,387 stroke-related genes, among which 99 DEGs in MDD were also embedded. Based on the 99 overlapped DEGs, we identified three gene signatures (SPATA2, ZNF208, and YTHDC1) using three machine-learning classifiers. Predictions of the three mRNAs highlight four miRNAs as follows: miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p. Subsequently, a miRNA–mRNA network was developed. Conclusion The study highlighted gene signatures for PSD with three genes (SPATA2, ZNF208, and YTHDC1) and four upstream miRNAs (miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p). These biomarkers could further our understanding of the pathogenesis of PSD.

Results: Using the GSE dataset, we retrieved a total of DE-miRNAs, which included upregulated and downregulated ones. Meanwhile, WGCNA identified two modules (blue and brown) that were significantly correlated with the subject group. A total of stroke-related miRNAs were identified with the intersection of DE-miRNAs and WGCNA-related ones. Based on the miRNA-mRNA databases, we identified a list of , stroke-related genes, among which DEGs in MDD were also embedded. Based on the overlapped DEGs, we identified three gene signatures (SPATA , ZNF , and YTHDC ) using three machine-learning classifiers. Predictions of the three mRNAs highlight four miRNAs as follows: miR--p, miR--p, miR--p, and miR--p. Subsequently, a miRNA-mRNA network was developed.
Conclusion: The study highlighted gene signatures for PSD with three genes (SPATA , ZNF , and YTHDC ) and four upstream miRNAs (miR--p, miR--p, miR--p, and miR--p). These biomarkers could further our understanding of the pathogenesis of PSD.

. Introduction
Post-stroke depression (PSD) is a common complication after a stroke, affecting up to one-third of stroke survivors (1). It compromises individual functional recovery, impairs the quality of life, and increases the burden on the healthcare system (2). It was observed that depressive symptoms were negatively correlated with functional recovery (3) and related to higher mortality (4). However, the pathogenesis of PSD remains elusive to date. MicroRNAs (miRNAs) are a group of small non-coding RNAs with downregulative activity on post-transcriptional gene expression by binding to the 3 ′ untranslated regions of target mRNAs (5,6). Profiles of miRNA expression could be as useful as mRNA in diagnosis and prognosis (7). Of note, miRNAs can be secreted into body fluids, including peripheral blood and urine, which can be non-invasively accessed for detection (8).
As such, serum miRNAs have been widely studied in various diseases, including stroke as potential biomarkers (9)(10)(11)(12). With the development of RNA sequencing technology, data sharing, and machine-learning methods, the identification of feature serum miRNAs and the construction of related signatures in a large number of subjects have become practical. According to previous studies, serum miRNA-based signatures could effectively predict the risk of strokes in healthy individuals (13) and clinical outcomes (14,15) in patients with neurological tumors. Moreover, miRNAs are functionally involved in numerous biological processes, including cellular metabolism (16), cell-cycle regulation (17), and immune response (18). Therefore, we hypothesized that miRNAs and their target mRNAs could be potential biomarkers implicated in the pathogenesis of PSD.
With the advancement of the machine-learning methods, the identification of relevant biomarkers becomes practical in the big data era. The least absolute shrinkage and selection operator (LASSO), a regression analysis algorithm, uses regularization to improve prediction accuracy (19). The support vector machine (SVM) is a supervised machine-learning technique widely utilized for both classification and regression (20). Random forest (RF) is considered the most accepted group classification technique because of having excellent features such as variable importance measure and out-of-bag error (21). In this study, we aimed to explore the miRNA as well as mRNA biomarkers in PSD and to construct a miRNA-mRNA regulatory network to reveal its potential pathogenesis using a machine-learning approach.
. Materials and methods

. . Data source
The miRNAs expression profile was obtained from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) using the accession number GSE117064, while mRNA expressions of blood samples from patients with late-onset MDD and controls were embedded in GSE76826. A total of 1,785 serum samples were incorporated into the dataset, which consists of 173 samples of patients with stroke and 1,612 controls. Extraction, detection, and data processing of serum miRNAs were provided in the previous report (13). Stroke was diagnosed based on physical and neurological examinations supplemented with brain imaging data, including computed tomography and/or magnetic resonance imaging, while healthy controls were defined as having no history of stroke and negative on medical checkup in a clinic (13). In GSE76826, there were 12 blood samples of late-onset MDD and 10 samples of controls. Late-onset MDD was defined according to the DSM-IV diagnosis and age of ≥ 50 years (22).
. . Di erential analysis, WGCNA, and identification of stroke-related genes The linear models for microarray data (LIMMA) package (23) in R software was applied to extract differentially expressed miRNAs (DE-miRNAs) between stroke and control samples. The pvalue was adjusted with the false discovery rate (FDR) (24). An FDR of < 0.05 and | FC| of > 1 were set as the threshold for DE-miRNAs. The visualization of differential analysis was presented with a heatmap and a volcano plot. Using GSE117064, the weighted gene correlation network analysis (WGCNA) (25, 26) was performed to construct a co-expression network to identify hub miRNA modules using the "WGCNA" package. Filtered miRNAs were employed to construct a scale-free network by calculating the connection strength between miRNAs. We assessed the correlation among miRNA modules as well as their correlations to the clinical group (stroke vs. control). Subsequently, DE-miRNAs incorporated in the stroke-related modules were identified as stroke-related miRNAs, and their target genes were predicted using miRDB (https://mirdb. org/), miRTarBase (https://mirtarbase.cuhk.edu.cn/~miRTarBase/ miRTarBase_2022/php/index.php), and TargetScan (https://www. targetscan.org/vert_80/).

. . Identification of overlapped DEGs in post-stroke depression
Overlapped DEGs for post-stroke depression were defined as aberrantly expressed mRNAs in the blood sample collected from stroke patients as well as late-onset MDD. In other words, these overlapped DEGs were implicated in two diseases simultaneously. Differentially expressed mRNAs (DE-mRNAs) for late-onset MDD were defined in a similar manner with an FDR of < 0.05 and | FC| of > 0.5. Among these DE-mRNAs, stroke-related genes were selected to identify overlapped DEGs for post-stroke depression. With these overlapped DEGs, the clusterProfiler R package (27)   were utilized to build gene signatures for PSD using the overlapped DEGs. Gene signatures were established with the intersection of gene sets identified by each algorithm. The receiver operating characteristic (ROC) curves were mapped for the identified gene signatures, where the area under curves (AUCs) was calculated as an indicator of classification. The AUCs of > 0.8 were considered excellent classification, while AUCs of > 0.7 were considered acceptable. By matching miRNA-mRNA pairs in multiple databases (miRDB, miRTarBase, and TargetScan), the upstream miRNAs were predicted using the gene signatures for PSD. Subsequently, a potential miRNA-mRNA regulatory network for PSD was established.

Results
. . Di erential analysis, WGCNA, and identification of stroke-related genes The overall analysis of the study was presented in the Graphical abstract. Using the LIMMA package, we retrieved a total of 667 DE-miRNAs, which included 420 upregulated and 247 downregulated ones. The feature DE-miRNAs with the absolute value of logFC of >2 were marked in green in the volcano plot ( Figure 1A). Meanwhile, a heatmap showing the top 100 DE-miRNAs is presented in Figure 1B. The details of these DE-miRNAs can be found in Supplementary material 1.
We performed the hierarchical clustering of the samples (Figure 2A), and the soft-thresholding power was set at 5 with the cutoff score of Scale-free R 2 being 0.9 ( Figure 2B). The clustering dendrograms of the sample identified five modules ( Figure 2C) and their correlations with clinical groups were presented in the heatmap plot ( Figure 2D). Subsequently, we selected the blue and brown modules with the highest correlation coefficient for downstream analysis. The scatter plots in Figures 2E, F show a significant correlation between gene significance and module memberships in the aforementioned modules, while the details can be accessed in Supplementary material 2.
Stroke-related miRNAs were identified with the intersection of DE-miRNAs and modules of interest found in WGCNA, leading to the identification of 117 miRNAs ( Figure 3A). A total of 2,387 target mRNAs were predicted with these stroke-related miRNAs according to the databases. Thus, these mRNAs were identified as stroke-related genes.

. . Identification of overlapped DEGs in post-stroke depression
Using mRNA expression profiles in GSE76826, we identified 641 DEGs with 10 samples from late-onset MDD and 12 controls. To find potential biomarkers in PSD, we intersected stroke-related genes with DEGs observed in MDD patients. As a result, 99 DEGs were identified ( Figure 3B). The list of the 99 DEGs can be accessed in Supplementary material 3.
Thereafter, using clusterProfiler in R, GO functional and KEGG enrichment analyses were performed on the 99 DEGs to further understand their biological functions. As represented in Figure 3C, the biological process (BP) was significantly enriched in T cell differentiation in the thymus, mononuclear cell differentiation, cellular response to glucose starvation, regulation of striated muscle cell differentiation, regulation of lymphocyte apoptotic process; the cellular component (CC) was particularly enriched in the cytoplasmic side of the plasma membrane, cytoplasmic side of the membrane, nuclear speck, podosome, heterotrimeric G-protein complex; and molecular function (MF) was mainly enriched in DNA-binding transcription activator activity, RNA polymerase IIspecific, DNA-binding transcription activator activity, cytokine receptor binding, platelet-derived growth factor receptor binding, and mitogen-activated protein (MAP) kinase activity.  Furthermore, the KEGG pathway analysis of the 99 DEGs is shown in Figure 3D. Among all the pathways enriched, the top five most significant pathways were as follows: sphingolipid signaling pathway, EGFR tyrosine kinase inhibitor resistance, human cytomegalovirus infection, FoxO signaling pathway, and MAPK signaling pathway. Among them, MAPK signaling pathways have been associated with the pathophysiology of PSD in several studies (28)(29)(30).
The PPI network of the 99 DEGs was constructed by the STRING online database with high confidence of >0.4 applied. The disconnected nodes (genes) were removed from the PPI network ( Figure 4A). The network was then presented using the cytoHubba tool in Cytoscape ( Figure 4B); the top 10 hub genes were as follows: TP53, MAPK14, VEGFA, GPR29, CD40LG, SMAD3, GNAQ, PTEN, IL7R, and IL6R.
. . Selection of feature mRNA by machine-learning algorithms and construction of a miRNA-mRNA network For a further selection of the mRNA features for post-stroke depression, we used the LASSO algorithm to identify a set of 13 mRNAs (Figure 5A), the SVM algorithm to select a set of 10 mRNAs ( Figures 5B, C), and the RF algorithm to select a set of 23 mRNAs ( Figure 5D). Specifically, a total of 99 genes were selected by the SVM algorithm to construct the classification model using a 10-fold cross-validation. When 10 genes were selected, the accuracy of the model was the highest (Figures 5B, C). The RF algorithm screened out a total of 99 genes, and the top 23 genes with positive values of importance were selected (Supplementary material 4).
RF: The RF algorithm screened out a total of 99 genes, and the top 23 genes with positive values of importance were selected (datasheet attached). After combining the mRNAs screened out via the LASSO, SVM, and RF algorithms, three diagnostic mRNAs (SPATA2, ZNF208, and YTHDC1) were identified for post-stroke depression ( Figure 5E). These genes were also validated using the ROC curves with AUCs > 0.89 (Figures 6A-C). The prediction of these genes revealed a panel of four miRNAs, with which the genes form a miRNA-mRNA regulatory network, as presented in Figure 6D.

. Discussion
In the present study, we identified stroke-related genes through differential analysis, WGCNA, as well as target .
/fneur. . prediction via miRNAs. Subsequently, 99 overlapped DEGs were identified in late-onset MDD to reveal potential biomarkers in PSD. Enrichment analysis revealed that these genes were implicated in pathways related to sphingolipid signaling, EGFR tyrosine kinase inhibitor resistance, human cytomegalovirus infection, FoxO signaling, and MAPK signaling. Furthermore, three machine-learning algorithms were employed to explore gene signatures for PSD, which was validated with the ROC curves. At last, a miRNA-mRNA network was constructed.
Our study highlighted gene signatures for PSD with three genes: SPATA2, ZNF208, and YTHDC1. However, there were no reports on their role in PSD. SPATA2 enables signaling receptor complex adaptor activity and ubiquitin-specific protease binding activity (31). SPATA2 is involved in several processes, including protein deubiquitination, necroptotic process, and tumor necrosis factor-mediated signaling pathway (32,33). The knockdown of SPATA2 leads to the activation of P38MAPK and NLRP3 inflammasome and the enhancement of NF-κB signaling, indicating that SPATA2 plays a protective role against brain inflammation induced by ischemia/reperfusion injury (34). ZNF208 polymorphisms were observed to be associated with ischemic stroke in a Chinese Han population (35); however, no other reports concerning its role in stroke or depression were reported. An increasing number of studies have shown that YTHDC1, an important N6-methyladenosine (m6A) reader, plays a key role in multiple biological functions as well as in disease progression. It was observed that YTHDC1 could be protective against ischemic stroke by enhancing Akt phosphorylation via destabilizing PTEN mRNA (36). Therefore, it could be a potential therapeutic target for ischemic stroke. With the gene signatures for PSD, we predicted a list of four miRNAs (miR-6883-5p, miR-6873-3p, miR-4776-3p, and miR-6738-3p) based on the databases. Subsequently, a miRNA-mRNA network was developed, and it could shed light on the pathogenesis of PSD.
Emerging studies have investigated the role of miRNA-mRNA networks in the pathogenesis and progression of diseases, such as HBV-related hepatocellular carcinoma (37), stroke due to atrial fibrillation (38), and MDD in ovarian cancer patients (39). These studies revealed potential mechanisms by which the existing risk factor or disease contributes to the development of specific complications. In the case of PSD, datasets were exploited to find overlapped DEGs in stroke and late-onset MDD. Using three machine-learning classifiers, we further selected three feature genes and four upstream miRNAs, which could be potential targets for PSD treatment. To the best of our knowledge, the present study was the first to depict a miRNA-mRNA network for PSD; further investigations with a focus on the biological functions of these miRNAs and mRNAs are necessary. Our study suffered from a lack of wet lab validation, which may be the major limitation. However, the role of the miRNAs and mRNAs on PSD was first reported in our study, which could be of interest to further studies.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding authors.

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the patients/participants or patients/participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions
HQ, YM, and YS: data curation, formal analysis, roles/writingoriginal draft, and writing-review and editing. LS: data curation. YM and YS: conceptualization, design, and methodology. All authors contributed to the article and approved the submitted version.

Funding
The study was funded by the National Key R&D Program of China (2022YFC2009700 and 2022YFC2009701) and Basic Science (Natural Science) Research Project of Higher Education Institutions in Jiangsu Province (No. 23KJB320001).