Identification of biomarkers and prediction of upstream miRNAs in diabetic nephropathy

Objective To explore biomarkers of diabetic nephropathy (DN) and predict upstream miRNAs. Methods The data sets GSE142025 and GSE96804 were obtained from Gene Expression Omnibus database. Subsequently, common differentially expressed genes (DEGs) of renal tissue in DN and control group were identified and protein-protein interaction network (PPI) was constructed. Hub genes were screened from in DEGs and made an investigation on functional enrichment and pathway research. Finally, the target gene was selected for further study. The receiver operating characteristic (ROC) curve was used to evaluate the diagnostic efficiency of target gene and predicted its upstream miRNAs. Results 130 common DEGs were obtained through analysis, and 10 Hub genes were further identified. The function of Hub genes was mainly related to extracellular matrix (ECM), collagen fibrous tissue, transforming growth factor (TGF) -β, advanced glycosylation end product (AGE) -receptor (RAGE) and so on. Research showed that the expression level of Hub genes in DN group was significantly higher than that in control group. (all P<0.05). The target gene matrix metalloproteinase 2 (MMP2) was selected for further study, and it was found to be related to the fibrosis process and the genes regulating fibrosis. Meanwhile, ROC curve analysis showed that MMP2 had a good predictive value for DN. miRNA prediction suggested that miR-106b-5p and miR-93-5p could regulate the expression of MMP2. Conclusion MMP2 can be used as a biomarker for DN to participate in the pathogenesis of fibrosis, and miR-106b-5p and miR-93-5p may regulate the expression of MMP2 as upstream signals.


Introduction
Diabetic nephropathy (DN) is a common microvascular complication of diabetes and a major cause of end-stage renal disease (ESRD) throughout the world (1). DN is characterized by glomerular basement membrane thickening, mesangial expansion and accumulation of matrix, and eventually progression to fibrosis of glomeruli and renal tubules. Its pathogenesis is very complex, involving advanced glycosylation end products (AGE), inflammation, oxidative stress, apoptosis, autophagy and various signaling mechanisms leading to extracellular matrix (ECM) deposition and renal interstitial fibrosis (2). Finally, it leads to the appearance of albuminuria and the progressive decrease of estimated glomerular filtration rate (eGFR). Therefore, new identification and therapeutic tools to improve DN are urgently needed.
miRNA is an endogenous non-coding RNA (about 22 nucleotides long) that usually targets the 3 'untranslated region (3' -UTRs) of mRNA, resulting in post-transcriptional gene silencing (3). A large amount of evidence has shown that miRNA is involved in the occurrence of diabetes and its complications, such as DN, neuropathy, retinopathy, cardiomyopathy and wound healing. Zhong et al. found that miR-21 is highly expressed in the renal cortex of diabetes mice with microalbuminuria and positively regulated the expression of ECM by regulating transforming growth factor (TGF)-b, Nuclear factor kB (NF kB) and Smad homolog 7 (SMAD7) play a pathological role in renal fibrosis and inflammation (4). Therefore, regulation of miRNA is promising in the treatment of DN.
Bioinformatics, which research objects mainly focus on gene and protein, has played a vital role in the research of life science with its rapidly development in recent years. Although many mechanisms are involved in the pathogenesis of DN, the potential bioinformatics driving its pathogenesis is rarely fully elucidated. In this study, the gene expression data of DN kidney tissue from the Gene Expression Omnibus (GEO) was analyzed to predict the key genes causing renal fibrosis and explore their upstream miRNAs. This provides a new idea for the early diagnosis and treatment of DN. The specific working flow chart is shown in Figure 1.

Microarray data
The DN gene expression data sets were retrieved in the NCBI-GEO database (https://www.ncbi.nlm.nih.gov/geo/) using the keywords "diabetic nephropathy" and "diabetic kidney disease". Finally, two renal tissue gene expression data sets GSE142025 and GSE96804 with relatively large sample size were finally obtained. GSE142025 is based on GPL20301 platform, including 27 cases of DN and 9 cases of normal control group (5). GSE96804 is based on GPL17586 platform, including 41 cases of DN and 20 cases of renal tissue control not affected by tumor (6). After downloading the gene expression matrix, the probe identification number (ID) was converted into gene symbol using R software (version 4.2.2). For multiple probes corresponding to a gene, the average expression value was taken as the gene expression value.

Differential expression analysis
Limma software package can perform differential expression analysis on RNA sequencing data and high-throughput data, which contains powerful functions of reading, standardizing and exploring data (7). The R software and the limma package in Bioconductor were used to identify differentially expressed genes (DEGs) between DN and control group. The ratio of gene expression between the two groups (fold change, FC) was calculated, and the logarithm base 2 (log2FC) and adjusted P-value were selected. In line with the adjusted P -value < 0.05, | log2FC | > 1 genes are considered to be DEGs. Negative values represent down-regulated gene, while positive values represent up-regulated gene. Use ggplot 2 package (R software) to make volcanic maps. Then, through the online website (http://bioinformatics.psb.ugent.be/webtools/Venn/) draw Venn diagram to identify common DEGs for subsequent analysis.

PPI network construction and hub gene identification
STRING( https://cn.string-db.org/) (version 11.5) is an online tool for interacting gene retrieval, which provides new insights into the molecular mechanism of disease occurrence by integrating protein interactions (8). Protein-protein interaction (PPI) networks for common DEGs were predicted by STRING. The confidence score (>0.40) was used as the screening criteria, and the results were visualized by the CytoScape software (version 3.8.0). CytoScape is an open source bioinformatics software platform for visualizing molecular interaction networks. CytoHubba is a tool for defining network topology to find Hub genes. Subsequently, MCC algorithm of CytoHubba plugin was Work flow diagram includes data preparation, processing and analysis.
used to select the top 10 genes with the highest node connection closeness as the Hub genes, and boxplot was drawn to show the expression level.

Biological function and pathway analysis
DAVID database (https://david.ncifcrf.gov/) (version DAVID2021), an online gene function classification tool, through the powerful aggregation algorithm, complete the annotation, visualization and integration for gene and protein function (9). Gene Ontology (GO) analysis is a bioinformatics method that annotates genes and their protein products. It is widely used in analyzing the functional similarity between genes and identifying biological functions and pathways related to diseases by highthroughput biological data analysis (10). The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a public and comprehensive database for understanding the advanced functions of cells and organisms from genomics and other high-throughput data sets, integrating genomes, biological pathways, diseases, drugs and chemicals (11,12). In order to understand the function of Hub genes, we used the DAVID database for GO functional analysis and KEGG pathway enrichment. Typically, GO analyses are annotated with biological processes (BP), cellular components (CC), and molecular functions (MF). P<0.05 was considered statistically significant. If there are more than 10 results, this study will only show the top 10 results in P-value ranking.

Genomic variation analysis and correlation analysis
The first gene ranked by MCC algorithm among Hub genes was selected as the target gene analysis. Each sample in the dataset GSE142025 was scored for fibrosis enrichment using the GSVA package in Bioconductor and R software, and the results were visualized using heat maps. Subsequently, the correlation between target genes and common fibrosis genes was analyzed by Pearson correlation. The genes related to fibrosis function were collected from the GSEA (http://www.gsea-msigdb.org/gsea/index.jsp) (version 4.3.2).

ROC curve analysis
In order to effectively distinguish patients with DN from the control group, we analyzed the ROC curve of the target gene using the pROC package (R software).

miRNA prediction
Encyclopedia of RNA Interactors (ENCORI) database (https:// starbase.sysu.edu.cn/) is an open source network tool for studying the interactions among ncRNAs and RNA-RNA (13). Based on the ENCORI database, we predicted the upstream miRNAs of the Hub genes, and the screening results were jointly calculated by miRanda and Targetscan plugins. The results were presented in the network diagram by the CytoScape software.

Statistical analysis
Graphpad Prism (version 9.3.0) and R software were used for statistical analysis. The Hub genes expression analysis uses t test, and the pROC package (R software) was used for ROC curve analysis to calculate the area under the curve (AUC).

Identification of DEGs
Take adjusted P-value<0.05, | log2FC |>1 as the screening criteria. 1166 DEGs were obtained from the GSE142025 dataset, including 660 up-regulated and 506 down-regulated genes. 609 DEGs were obtained from GSE96804 dataset, including 283 upregulated and 326 down-regulated genes. The volcano maps were drawn separately to show the DEGs of two data sets (Figures 2A, B), and 130 common DEGs were identified by Venn diagram ( Figure 2C). Table 1 lists common DEGs, in which NPIPB5, FAM151A and NKG7 have opposite expressions in the two data sets are not listed.

PPI network and hub gene screening
PPI network analysis was performed on 130 common DEGs using STRING and Cytoscape for visualization ( Figure 2D). The CytoHubba plugin was used for PPI network correlation analysis, and 10 Hub genes were identified according to MCC algorithm ( Figure 2E). In order of rank, they were MMP2, FN1, COL1A2, COL3A1, COL6A3, LUM, VCAN, FBN1, THBS2 and FOS. Finally, MMP2, which was ranked first, was selected as the target gene for further analysis.

GO function and KEGG pathway enrichment analysis
GO functional analysis of the Hub genes showed that biological processes were mainly enriched in skeletal system development, heart development, cellular response to amino acid stimulus, collagen fibril organization, TGF-b receptor signaling pathway, cell adhesion, extracellular matrix organization, cellular response to reactive oxygen species and other functions ( Figure 3A). In terms of cell components, genes were mainly enriched in ECM, extracellular region, and organelles of endocellular structures ( Figure 3B). At the molecular level, genes were enriched in ECM structural components, ECM structures related to tensile and compressive strength, protease binding, integrin binding and heparin binding ( Figure 3C). KEGG pathway analysis showed that the Hub genes was enriched in ECM-receptor interaction, AGE-RAGE signaling pathway in diabetic complications, relaxin signaling pathway focal adhesion, PI3K-Akt signaling pathway, and so on ( Figure 3D). (all P<0.05). These findings indicate that Hub gene is involved in the occurrence of DN and plays an important role in regulating ECM formation and fibrosis.

DEGs
Gene names

MMP2 participates in the development of fibrosis
Renal fibrosis is one of the main causes of DN, which is characterized by the activation and proliferation of fibroblasts and the deposition of ECM (14). Matrix metalloproteinase-2(MMP2) play an important role in renal fibrosis and DN (15). Therefore, we investigated the effect of MMP2 activation on fibrotic pathways and cytokine characteristics. GSVA was used to determine the enrichment fraction of the fibrosis process. The results showed that in the GSE142025 database, MMP2 expression was positively correlated with most fibrosis functions, including stress fiber assembly, elastic fiber assembly, outer dense fiber, contraction fiber and other functions ( Figure 4A). The correlation was tested by Person correlation analysis. In both data sets, the correlation coefficients between MMP2 and common fibrosis genes were positive numbers ( Table 2). It was shown that MMP2 was positively correlated with common fibrosis genes such as TGFB1, FN1 and CXCL6  Frontiers in Endocrinology frontiersin.org ( Figures 4B, C). These results support the hypothesis that MMP2 is involved in the regulation of fibrosis during the development of DN.

The expression of Hub gene and its diagnostic value in DN
In the GSE142025 dataset, compared with the control group, the expressions of MMP2, FN1, COL1A2, COL3A1, COL6A3, LUM, VCAN, FBN1, THBS2 in DN renal tissue were up-regulated, while FOS expression was down-regulated (P<0.05). This result was verified in the GSE96804 dataset ( Figures 5A, B). The target gene MMP2 was selected for ROC curve analysis, and the results show that AUC in the data sets GSE142025 and GSE96804 were 95.1% and 90.4% respectively ( Figures 5C, D). This suggests that MMP2 is significantly enriched in DN and can be used as a potential biomarker to predict its occurrence.

Discussion
DN is the most common microvascular complication of diabetes, increasing in incidence year on year and becomes the main cause of end-stage renal disease (16). At present, there are few studies on the genome of DN. Exploring its changes in the genome, analyzing the involved biological functions and predicting the possible molecular mechanisms will be helpful for the diagnosis and treatment of DN.
In this study, the common DEGs of DN and the control group were obtained based on the gene chip matrix, and 10 Hub genes were finally selected for GO functional analysis and KEGG pathway enrichment. It is found that the functions of Hub genes are mainly concentrated in "collagen fiber tissue, ECM components, TGF-b receptor signaling pathway" and other aspects. This is consistent with the previous study. Some scholars found that genes related to angiogenesis, ECM secretion and immune cell infiltration were involved in the occurrence of diseases through single cell sequencing of DN glomerulus. At the same time, this study revealed that although eGFR was relatively normal at the early  stage of the disease, there were mild to moderate glomerulosclerosis and interstitial fibrosis (17). In DN patients and animal models, urinary TGF-b level was higher than control group, and plasma TGF-b1 level was closely correlated with the severity of renal dysfunction (18). Overproduction of TGF-b1 in the glomerular apparatus led to proteinuria, polyuria, decreased eGFR, and increased ECM deposition in the glomeruli, indicating that TGFb participated in the occurrence of DN (19). These studies are basically consistent with the functions enriched by Hub genes, suggesting that these genes have diagnostic and therapeutic value for DN. The enrichment of KEGG pathway showed significant enrichment in "ECM-receptor interaction, AGE-RAGE signaling pathway in diabetic complications, PI3K-Akt signaling pathway, adhesion plaque" and other aspects. It was found that AGE can induce collagen production, and inhibition of collagen production can significantly reduce the severity of DN. At the same time, AGE can activate TGF-b/Smad pathway and enhanced TGF-b Transcriptional activity. In addition, the crosstalk mechanism of AGE (RAGE)-ERK/p38MAKPs-Smad can activate Smad2/3 to play an important role in DN (19). This indicates that AGE-RAGE and TGF-b pathway related genes play an important role in the process of DN fibrosis. On the other hand, AGE-RAGE signaling pathway is related to the production of reactive oxygen species (ROS) and mitochondrial dysfunction. The inflammatory reaction mediated by NF-kB and PI3K-Akt signaling pathways triggered by ROS aggravates the renal injury in diabetes (20). Therefore, it is speculated that genes involved in the PI3K-Akt pathway and AGE-RAGE pathway have a hand in DN in the induction of oxidative stress and inflammation response.
For this reason, improving renal fibrosis plays a positive role in the treatment of DN. In this context, we aimed to further explore the related genes involved in fibrosis, finally selected the target gene MMP2 for further research, explored the miRNAs acting on this gene, and find a new therapeutic target to improve DN.
MMP2 plays a key role in the development of chronic kidney disease by remodeling the extracellular matrix, distorting the structure of the glomerular basement membrane, contributing to the development of tubulointerstitial fibrosis and leading to progressive kidney injury (15). The study found that the expression of MMP2 in renal tissue of DN patients increased (21), and the serum level of MMP-2 was significantly higher than that of normal healthy subjects (22). Serum creatinine, eGFR and proteinuria were remarkable correlated with serum MMP-2 levels in patients with DN (22). This was consistent with our research findings, the expression level of MMP2 in DN nephridial tissue was significantly higher than in the control group. At the same time, GSVA analysis indicated that MMP2 was bound up with the most fibrosis processes. Correlation analysis showed that the expression of MMP2 was positively correlated with common fibrosis genes such as TGFB1, FN1, CXCL6, etc. Based on the above results, we predicted that MMP2 could be used as a relevant target for the diagnosis of DN and drug intervention. Subsequently, ROC curve analysis verified that MMP2 could effectively distinguish patients with DN from the control group. The involvement of MMP2 in the occurrence of chronic kidney disease also contains other mechanisms, such as inducing the production of mitochondrial ROS, mitochondrial autophagy, activating systemic inflammatory response, and infiltration of renal monocytes, which cause structural abnormalities in the kidney (23). This requires us to further study its mechanism in DN.
miRNA is found to regulate the signaling pathways related to inflammation, autophagy, apoptosis and fibrosis, and participate in the occurrence of DN. It is reported that many miRNAs, like miR-21, miR-217, miR-216a, miR-200, miR-195, miR-451, miR-141, miR-93, miR-29, etc regulate DN signal transduction by combining with 3'UTR of target genes (24). Similarly, multiple drugs play a protective role in DN by regulating miRNA expression. Resveratrol upregulates autophagy and inhibits apoptosis by inhibiting the expression of miR-383-5p (25). It could also inhibit apoptosis by up regulating the expression of miR-18a-5p, thus having beneficial effects on the kidney (26). Hyperoside could cause epigenetic changes by down-regulation of mi-R21 and increase the level of MMP-9 protein (27). Through the above mechanism, hyperoside could regulate ECM and have an impact on the renal function of DN mice (28). For this reason, this study predicted the upstream miRNAs regulating MMP2, and identified that hsa-miR-17-5p, hsa-miR-20a-5p, hsa-miR-93-5p, hsa-miR-106a-5p, hsa-miR-106b-5p, hsa-miR-20b-5p and hsa-miR-519d-3p could regulate the expression of MMP2. Previous study has shown that miR-106b-5p is involved in the development of chronic thromboembolic pulmonary hypertension by negatively regulating the expression of MMP2 (29). miR-93-5p inhibits the proliferation, invasion and migration of tumor cells by targeting MMP2 in glioma (30). This provides a theoretical basis for miR-106b-5p and miR-93-5p to regulate MMP2 and thus delay the development of DN, but the specific regulatory mechanism needs further investigation.
There are some limitations for this research. First of all, there are few data sets available for us to choose. The number of samples of DN patients is limited and clinical data of samples cannot be obtained. Secondly, when GSVA analysis is conducted, due to FIGURE 6 Regulatory network of MMP2 and upstream miRNAs.
Yin et al. 10.3389/fendo.2023.1144331 differences in sequencing genes obtained from different samples, all data sets cannot be used for analysis. Thirdly, the biomarkers identified in this study have not been experimentally verified, and the relevant mechanisms will be further investigated in future studies.

Conclusions
We determined that MMP2 is a gene closely related to DN fibrosis, which can effectively distinguish patients with DN from the control group. It is predicted that miR-106b-5p and miR-93-5p are upstream signals that regulate the expression of MMP2 and thereby produce a marked effect in DN.

Author contributions
Research design, data processing and interpretation, statistical analysis and manuscript drafting were completed by DY. Research concept, data results check and manuscript knowledge revision were completed by ZG. Figures and diagrams drawing were completed by XZ. All authors contributed to the article and approved the submitted version.