Identification of biomarkers for the diagnosis of chronic kidney disease (CKD) with non-alcoholic fatty liver disease (NAFLD) by bioinformatics analysis and machine learning

Background Chronic kidney disease (CKD) and non-alcoholic fatty liver disease (NAFLD) are closely related to immune and inflammatory pathways. This study aimed to explore the diagnostic markers for CKD patients with NAFLD. Methods CKD and NAFLD microarray data sets were screened from the GEO database and analyzed the differentially expressed genes (DEGs) in GSE10495 of CKD date set. Weighted Gene Co-Expression Network Analysis (WGCNA) method was used to construct gene coexpression networks and identify functional modules of NAFLD in GSE89632 date set. Then obtaining NAFLD-related share genes by intersecting DEGs of CKD and modular genes of NAFLD. Then functional enrichment analysis of NAFLD-related share genes was performed. The NAFLD-related hub genes come from intersection of cytoscape software and machine learning. ROC curves were used to examine the diagnostic value of NAFLD related hub genes in the CKD data sets and GSE89632 date set of NAFLD. CIBERSORTx was also used to explore the immune landscape in GSE104954, and the correlation between immune infiltration and hub genes expression was investigated. Results A total of 45 NAFLD-related share genes were obtained, and 4 were NAFLD-related hub genes. Enrichment analysis showed that the NAFLD-related share genes were significantly enriched in immune-related pathways, programmed cell death, and inflammatory response. ROC curve confirmed 4 NAFLD-related hub genes in CKD training set GSE104954 and other validation sets. Then they were used as diagnostic markers for CKD. Interestingly, these 4 diagnostic markers of CKD also showed good diagnostic value in the NAFLD date set GSE89632, so these genes may be important targets of NAFLD in the development of CKD. The expression levels of the 4 diagnostic markers for CKD were significantly correlated with the infiltration of immune cells. Conclusion 4 NAFLD-related genes (DUSP1, NR4A1, FOSB, ZFP36) were identified as diagnostic markers in CKD patients with NAFLD. Our study may provide diagnostic markers and therapeutic targets for CKD patients with NAFLD.


Introduction
Chronic kidney disease (CKD) is defined as structural or functional abnormalities of the kidney caused by various causes for ≥ 3 months (1). Epidemiological studies show that there are approximately 434.3 million people with CKD in Asia, most of whom come from developing countries like China and India (2). The effective control of chronic kidney disease is a huge public health challenge worldwide (3).
Previous studies have suggested that acute kidney injury, hypertension, and diabetes are risk factors for CKD (4). Recently, accumulating evidence indicates that Non-alcoholic fatty liver disease (NAFLD) may be associated with the development of CKD (5)(6)(7)(8).
NAFLD is a heterogeneous disease in which the vast majority are non-alcoholic fatty liver (NAFL) and less than 20% are nonalcoholic steatohepatitis (NASH). Nash has typical characteristics which include inflammation, hepatocyte ballooning, and hepatic injury with or without fibrosis (9,10). NAFLD is often associated with a variety of metabolic diseases, including hypertension, diabetes, insulin resistance, etc, which are risk factors for CKD. However, the degree of fibrosis in NAFLD was independently associated with CKD progression even when confounding factors such as metabolic diseases were excluded (11)(12)(13)(14)(15). Excess fat may association with CKD progression in NAFLD patients by inducing lipotoxicity, inflammation, oxidative stress and fibrosis through pro-inflammatory adipokines and lipocalin (16,17).
Despite growing evidence of the strong association between NAFLD and CKD, the key molecules and potential mechanisms involved remain unclear. Here, bioinformatics and machine learning were attempted to discover the diagnostic markers and related signaling pathways of CKD in the context of NAFLD, which were hoped to provide a basis for the clinical treatment of CKD patients with NAFLD.

Data acquisition and preliminary processing
Four data sets [GSE104954, GSE104948 (18), GSE32591 (19), GSE66494 (20)] containing gene expression profiles for Chronic kidney disease (CKD) samples and one date set [GSE89632 (21)] of non-alcoholic fatty liver disease (NAFLD) were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/).Details for the data sets were provided in Table 1. The principal search flow of the article was illustrated in Figure 1.

Weighted gene co-expression network analysis and module gene selection in NAFLD patients
The WGCNA method was used to construct gene coexpression networks and identify functional modules. First, the median absolute deviation (MAD) of each gene was determined, and genes with MAD values in the bottom 50% were removed. Second, ineligible genes and samples were removed with the goodSamplesGenes function, and a scale-free coexpression network was built. Third, an appropriate "soft" threshold power (b) was determined to calculate intergenic adjacency; then, the adjacency values were converted into a topological overlap matrix (TOM), and gene proportions and phase dissimilarities are determined. Fourth, modules were detected using hierarchical clustering and dynamic tree cutting functions. Finally, gene significance (GS) and module membership (MM) correlations were calculated, and the corresponding module gene information was extracted for further analysis.

Acquisition of NAFLD-related shared genes
Intersection of DEGs in GSE104954 and WGCNA module genes in GSE89632 were defined as NAFLD-related shared genes, represented by a Venn schema by the online website jvenn (http:// jvenn.toulouse.inra.fr/app/example.html).

Enrichment analysis
For enrichment analysis of NAFLD-related shared genes, Gene Ontology (GO) annotations of genes from the R package org.Hs.eg.db based on the R package "clusterProfiler", and the minimum number of genes per gene set was 5 and the maximum was 5000. Gene annotations for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were operated through KOBAS-i online tool (http://kobas.cbi.pku.edu.cn/) (22), and a false discovery rate <0.05 was considered statistically significant.

Establishment of protein-protein interaction network and identification of NAFLD-related hub genes by cytoscape software and machine learning
The NAFLD-related shared genes were uploaded to the STRING database (http://string-db.org/) to construct the PPI network with a PPI score threshold (medium confidence≥0.700). Hub genes were screened by the Cytoscape (Version 3.9.1) plug-in APP MCODE (Version 2.0.0). At the same time, the machine learning methods random forest (RF) were used to screen for hub genes. The MeanDecreaseGini (MDG) was used to measure the importance of genes by the RF algorithm with "randomForest" package, and hub genes were defined as MDG greater than 1.5. The final NAFLD-related hub genes were obtained by the intersection of the results of cytoscape software and machine learning.

Verification of NAFLD-related hub genes expression in the CKD data sets
Data set GSE32591 and GSE66494 were used to identify the expression level of the hub genes. GSE 32591 is composed of 29 control samples and 64 samples with lupus nephritis. GSE66494 contains 53 biopsy samples, including 8 control samples and 45 CKD samples.
2.8 Construction of receiver-operating characteristic curves to assess diagnostic efficacy ROC curves were constructed by "pROC" package in Xiantao Academic (https://www.xiantao.love/products) to evaluate the diagnostic value of NAFLD-related hub genes in the CKD training set GSE104954 and other CKD validation date sets (GSE32591,GSE66494 and GSE104948). Further the diagnostic value of hub gene in the NAFLD data set GSE89632 was also evaluated.

Immune infiltration analysis and correlation analysis
The composition and abundance of 22 types immune cells can be estimated from the CKD and control samples transcriptome in GSE1 04954 date set by CIBERSORTx (https://cibersortx.stanford.edu/). The correlations of NAFLD-related hub genes expression with immune cell infiltrations were investigated in Xiantao Academic, as were their respective correlations.

Statistical analysis
All statistical analyses of bioinformatics studies in this study were conducted using R software. The differences between the groups were tested using a nonparametric Wilcoxon signed-rank test. Correlation analysis was performed using Spearman's correlation. In comparison, p < 0.05 was considered statistically significant (*p < 0.05, **p < 0.01, ***p<0.001, ****p<0.0001).

Key module genes in NAFLD samples were identified by WGCNA
GSE89632 is a representative dataset for investigation of NAFLD and we used it to obtain the most relevant modular genes for NAFLD (23)(24)(25).First, WGCNA was used for the identification of the most relevant modular genes for NAFLD.b = 26 (scale-free R 2 = 0.85) was selected as the "soft" threshold based on the scale independence and average connectivity ( Figure 2A). Then, different colors are chosen to represent 9 gene co-expression Validation of hub genes modules (GCMs), which are presented in Figure 2B. The correlation between NAFLD samples and GCMs is shown in Figure 2C, and the darkturquoise module (412 genes) which was regarded as critical modules demonstrated the highest correlation with NAFLD samples (correlation coefficient = -0.85, p = 5.3e-19). In addition, a significant positive correlation was observed between module membership and gene significance in darkturquoise modules for NAFLD samples (r = 0.68, p=4.2e-57), as shown in Figure 2D.

Identification of NAFLD-related shared genes between CKD and NAFLD
Next, 386 DEGs were found in GSE104954 data set, of which 227 were up-regulated, and 159 of these genes were down-regulated. Figure 3A shows the DEGs by the volcano diagram. The heatmap of the top 50 most significant DEGs in the data set is plotted in Figure 3B. Then, 386 DEGs and 412 module genes were intersected, and 45 NAFLD-related shared genes were subsequently obtained, as presented in the Venn diagram in Figure 3C (Detailed results were provided in Supplementary Materials S1).

Enrichment analyses of 45 NAFLDrelated shared genes
In order to explore the biological functions and pathways of NAFLD-related shared genes in the development of CKD, GO and KEGG enrichment analyses were performed for 45 shared genes. A total of 563 significantly related biological processes and 23 KEGG signaling pathways were obtained (Detailed results were provided in Supplementary Materials S2). GO analysis of shared genes was performed to reveal their biological functions ( Figures 4A-C). As we have seen, in the GO category, most of the share genes mostly involved in the "programmed cell death", "inflammatory response", "positive regulation of metabolic process", and "immune system process"(BP); "Extracellular matrix", "collagen-containing extracellular matrix" (CC); "DNA-binding transcription activator activity, RNA polymerase IIspecific", "DNA-binding transcription factor activity", etc (MF). The results of KEGG pathway enrichment showed that the most involved pathways were the IL-17 signaling pathway, TNF signaling pathway, MAPK signaling pathway, Apoptosis, Toll-like receptor signaling pathway, and so on, which are closely related to the immune response and inflammation ( Figure 4D). Frontiers in Endocrinology frontiersin.org 3.4 Identification of NAFLD-related hub genes via cytoscape software and machine learning and their differential expression was validated To reveal the interaction of each protein, the PPI network of the shared genes were built according to the STRING database. There were 38 edges and 45 nodes in Figure 5A, followed by analysis using Cytoscape software. MCODE plugin was used to discover the important modules in the PPI network and the results showed that 8 hub genes in two clusters were tightly connected as the important modules in Figure 5B. On the other hand,7 hub genes with MeanDecreaseGini> 1.5 were determined by the random forest algorithm in Figure 5C. A Venn diagram in Figure 5D showed the intersection of 4 hub genes (DUSP1, FOSB, NR4A1, ZFP36), which were used as NAFLD-related hub genes. Moreover, in the other two CKD datasets (GSE32591, GSE66494), the 4 NAFLD-related genes were significantly down-regulated ( Figures 6A, B), which was consistent with the change in GSE104954 (Supplementary Materials S3).

The ROC curve was used to evaluate diagnostic efficacy in CKD and NAFLD
The ROC curves of 4 NAFLD-related genes (DUSP1, FOSB, NR4A1, ZFP36) with AUCs of 0.961, 0.954, 0. 866, and 0.960 in the training set GSE104954, respectively ( Figure 7A). Meanwhile, in the validation set GSE32591, the AUCs of these hub genes were 0.828,0.796,0.927 and 0.689, respectively, which did not distinguish whether the source of the sample was glomerular or tubulointerstitial ( Figure 7B). At the same time, in the other validation set GSE66494, the AUCs of hub genes were 0.958,1.000,0.889, and 0.932, respectively ( Figure 7C). Comprehensive analysis of the results of the validation and training sets showed that the 4 NAFLD-related genes can serve as effective markers for the diagnosis of CKD.
GSE104948 and GSE104954, as sister datasets, represent glomerular and tubulointerstitial transcript level information of the same cohort of samples, respectively. 4 NAFLD-associated hub genes have the same good diagnostic efficacy for CKD in GSE104948 (Supplementary Materials S4). Similarly, in the data set GSE32591, both tubulointerstitial and glomerular samples were sampled, and further exploration revealed that the 4 diagnostic markers showed good efficacy in different anatomical structures (Supplementary Materials S5). What should be noted is that these 4 CKD diagnostic markers also have good diagnostic value for NAFLD in GSE89632 date set, and the ROC curves with AUCs of 0.951,0.968,0.974, and 0.915, respectively ( Figure 7D). This finding may suggest that four genes may play a significant role in the development of CKD patients with NAFLD.

Immune infiltration analysis and correlation analysis
According to the results of enrichment analysis, NAFLD-related shared genes may be involved in the immune-related mechanisms When exploring the interrelationships between the expression of the four diagnostic markers genes, it is interesting to note that they were all positively correlated with each other (Figure 8C), suggesting that they may participate in a common mechanism to promote CKD development. Additionally, the interplay of immune cells was explored ( Figure 8D). Treg cells and Mast cells activated had the strongest positive correlation with one another (r = 0.53). In contrast, resting mast cells showed the strongest negative correlation with activated mast cells (r = -0.83).
In summary, CKD samples showed significant changes in immune cell infiltration compared with controls, and the four diagnostic markers genes expression was significantly correlated with immune cell infiltration.

Discussion
NAFLD and CKD are both significant global public health burden, and there is evidence that NAFLD is independently associated with a high risk of CKD despite the exclusion of other metabolic diseases, while the underlying mechanisms are not clear (26). In our study, by obtaining DEGs and important module genes by WGCNA, 45 NAFLD-related share genes were obtained, and their enrichment analysis revealed that immune, inflammatory and programmed cell death pathways were significantly enriched. Further, 4 CKD diagnostic markers genes were obtained by cytoscape software and machine learning, which demonstrated good diagnostic value in both the training and validation sets of CKD. Interestingly, 4 CKD biomarkers also had good diagnostic performance for NAFLD in dataset GSE89632, indicating that they may be important targets for the development of CKD in NAFLD patient. DUSP1, dual-specificity protein phosphatase 1, is a member of the dual-specific phosphatase (DUSPs) family. Mitogen-activated protein kinases (MAPKs) was closely related to inflammation and immune, and DUSP1 improves microvascular fibrosis and inflammation through dephosphorylation of MAPKs (27, 28). Overexpression of DUSP1 alleviates renal tubular injury by regulating mitophagy and interrupte Mff-related excessive mitochondrial fission. At the same time, lncRNA NR_038323 reduced the degree of renal fibrosis by targeting DUSP1, suggesting that DUSP1 is a potential therapeutic target for CKD with NAFLD (29)(30)(31). However, the main evidence come from diabetic renal disease, and whether it is applicable to other types of CKD requires further study.
FOSB is a member of the FOS family, which is part of activator protein-1 (AP-1). AP-1 is associated with immune and cancer progression. Previous studies have shown that FOSB can be used as a diagnostic marker for lgA kidney disease (32). It has also been shown that MicroRNA-27a-3p targets FOSB to regulate the level of inflammation and fibrosis in lgA nephropathy (33). Zinc finger protein 36 (ZFP36) participates in posttranscriptional regulation by targeting different mRNAs, which was closely related to inflammatory diseases and autoimmune disease (34). The A B FIGURE 6 The downregulation of 4 NAFLD-related hub genes was verified by two CKD data set. (A) Expression of NAFLD-related hub genes in the GSE32591 date set. (B) Expression of NAFLD-related hub genes in the GSE66494 date set. NAFLD, Non-alcoholic fatty liver disease; CKD, chronic kidney diseases.
A B D C FIGURE 7 The diagnostic efficacy of 4 NAFLD-related hub genes was verified by ROC curve in CKD and NAFLD data sets. (A-D) The ROC curve of 4 NAFLDrelated hub genes in date set GSE104954, GSE32951, GSE66494 and GSE89632. dysregulated expression of ZFP36 may play an important role in the pathogenesis of inflammatory diseases including CKD. While it has been suggested that it could be used as a diagnostic marker of CKD (35,36). Further studies are needed to identify the underlying mechanism for FOSB and ZFP36 in CKD with NAFLD.
The orphan nuclear receptor 4A1 (NR4A1), which is also known as Nur77, belongs to the nuclear receptor superfamily, and is involved in inflammation and energy metabolism pathways (37). Previous studies have shown that it can be used as a therapeutic target for chronic kidney disease, which is consistent with our study (38). There are also studies show that Yiqi Huoxue Tongluo recipe, a traditional Chinese medicine, can alleviate renal inflammation and fibrosis by increasing the expression level of NR4A1. In contrast, the loss of NA4A1 results in increased kidney ingury associated with macrophage. Interestingly, in our results, NR4A1 expression was significantly negatively correlated with proinflammatory M1 macrophage infiltration. A recent study showed that the induction of anti-inflammatory macrophages expressing NR4A1/EAR2 could suppress M1 proinflammatory responses, thereby inhibiting immune-mediated crescent glomerulonephritis (39-41). Therefore, increasing the expression level of NR4A1 may be one of the potential strategies for the treatment of CKD with NAFLD.
Dysfunction of immune cells promotes inflammation and kidney fibrosis in CKD, so the immune infiltration status of CKD samples was explored (42). Previous studies have demonstrated that macrophage polarization plays an important role in CKD development (43). In our results, M1 macrophages were significantly upregulated. However, M2 macrophages are believed to be associated with fibrosis, but not M1 macrophages. While the role of M2 macrophages in CKD is contradictory (44,45).
Tregs cells are down-regulated in CKD, which is consistent with our results (46). There is some evidence to suggest that Tregs cells can inhibit inflammation and fibrosis in CKD (47). In turn, the CKD microenvironment changes the energetic metabolism of Tregs cells, thus inhibiting the protective effect of Tregs (48). Therefore, immune cells interact with the inflammatory microenvironment. B cells are thought to be involved in the progression of Membranous nephropathy (MN) by releasing antibodies against podocytes, so depletion therapy targeting B cells may be a potential treatment (49). However, existing data suggest that depletion of B cells does not achieve the expected effect in lgA nephropathy (50). Because of the significant heterogeneity of CKD, the potential significance of B-cell depleting therapy requires specific analysis (51). The MC-specific protease tryptase is released by mast cells, thereby activating significant fibrosis and inflammation (52). In our results, the expression of four CKD diagnostic markers was closely related to the infiltration of multiple immune cells, which also confirmed the important role of immune mechanisms in the development of inflammatory and fibrosis in CKD patients with NAFLD.
There are limitations to our study. First, CKD is an umbrella term with significant heterogeneity, and we were not able to analyze specific types of CKD; Second, we focus on the pathogenesis and diagnostic markers of CKD in the context of NAFLD, and it is important to note that CKD has the opposite effect on NAFLD, which is beyond the scope of our discussion; Third, our findings were required to validate in vivo and in vitro to better guide clinical practice, although the decreased expression of DUSPI and ZFP36 in CKD has been confirmed by related studies (36,53).

Conclusion
In this study, 4 NAFLD-related genes (DUSP1, NR4A1, FOSB, ZFP36) were identified as diagnostic markers in CKD patients, and NAFLD may accelerate the development of CKD through immune and inflammatory pathways. The changes in immune cell infiltration in CKD and the significant correlation with diagnostic markers were also elucidated. Our study may provide diagnostic markers and therapeutic targets for CKD patients with NAFLD.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

Author contributions
YC, YD, and WJ contributed equally to this work. YC and YD wrote the manuscript. WJ performed the data processing. JD, JY and HZ Participated in chart making. XZ edited the article. KT and ZY conceived and designed the scientific question. All authors contributed to the article and approved the submitted version.