Bioinformatics Analysis Reveals Crosstalk Among Platelets, Immune Cells, and the Glomerulus That May Play an Important Role in the Development of Diabetic Nephropathy

Diabetic nephropathy (DN) is the main cause of end stage renal disease (ESRD). Glomerulus damage is one of the primary pathological changes in DN. To reveal the gene expression alteration in the glomerulus involved in DN development, we screened the Gene Expression Omnibus (GEO) database up to December 2020. Eleven gene expression datasets about gene expression of the human DN glomerulus and its control were downloaded for further bioinformatics analysis. By using R language, all expression data were extracted and were further cross-platform normalized by Shambhala. Differentially expressed genes (DEGs) were identified by Student's t-test coupled with false discovery rate (FDR) (P < 0.05) and fold change (FC) ≥1.5. DEGs were further analyzed by the Database for Annotation, Visualization, and Integrated Discovery (DAVID) to enrich the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. We further constructed a protein-protein interaction (PPI) network of DEGs to identify the core genes. We used digital cytometry software CIBERSORTx to analyze the infiltration of immune cells in DN. A total of 578 genes were identified as DEGs in this study. Thirteen were identified as core genes, in which LYZ, LUM, and THBS2 were seldom linked with DN. Based on the result of GO, KEGG enrichment, and CIBERSORTx immune cells infiltration analysis, we hypothesize that positive feedback may form among the glomerulus, platelets, and immune cells. This vicious cycle may damage the glomerulus persistently even after the initial high glucose damage was removed. Studying the genes and pathway reported in this study may shed light on new knowledge of DN pathogenesis.


INTRODUCTION
Diabetic nephropathy (DN) is one of the most serious diabetic chronic microvascular complications and the major cause of end stage renal disease (ESRD) (1,2). Glomerulus damage is one of the primary pathological changes in DN (3). The progression of DN is known to occur in a series of pathological changes in the glomerulus, such as expansion of glomerular mesangium, glomerular basement membrane (GBM) thickness, and podocytes loss. These changes damage glomerular filtration, causing proteinuria and glomerulosclerosis. Eventually, this may cause a decrease in the glomerular filtration rate (GFR) and the development of end stage renal disease (4). Now, DN is the main reason for dialysis or a kidney transplant and is a great global public health burden (5). Currently, drugs only target the renin-angiotensin-aldosterone system (RAAS) and sodiumglucose cotransporter 2 (SGLT2) inhibitors to treat DN (5)(6)(7). Therefore, it is urgent to explore the newly found molecular mechanism of DN and provide a new target for the diagnosis and treatment of DN.
Transcriptomic analysis is a powerful tool used to discover new targets and explore many diseases including DN (8). A lot of work has been done using the transcriptomic method, which has provided some novel targets and mechanisms for DN (9)(10)(11)(12)(13)(14)(15)(16)(17). However, as is known, the transcriptomic method has some limitations. This method can only use a single race sample and a small sample number, which is disproportionate to their high costs. And most transcriptomic methods have poor instability for their great measurement error (9). So, gene screening by different transcriptomic research methods vary and even conflict sometimes. Bioinformatics tools can integrate multiple transcriptomic methods to increase the statistical power. So, reduced population samples and many stable differentially expressed genes can be obtained (17). Bioinformatic tools were used in a lot of studies to analyze existing transcriptomics data and some important discoveries were found. Tang et al. analyzed glomerulus and renal tubule tissue transcription omics data, and found that NTNG1 and HGF were potential DN biomarkers of high specificity and sensitivity (18). Wang et al. found that the glomerulus in DN kidney tissue mainly caused changes in cell connectivity and tissue cell modification, while renal tubular tissue mainly caused abnormalities in energy metabolism, and changes in methylation status of core regulatory genes might be a potential factor for the pathogenesis of DN (19). As far as we know, research has seldom used bioinformatic tools to analyze all human existing glomerulus transcriptomics datasets to discover new potential biomarkers and the pathogenesis of DN. So it will be very attractive to do it.
Bioinformatic tools combining the information of multiple independent transcriptomic studies fundamentally include metaanalysis and cross-platform normalization (20). In the metaanalysis approach, each experiment is first analyzed separately and then combined by one of three types of statistics: p-value, effect size, and ranked gene lists. Cross-platform normalization considers all platform transcriptomic data as a single dataset. This approach normalizes transcriptomic data to remove the artifactual differences between transcriptomic studies and preserves biological differences between conditions. Crossplatform normalization is thought to have better performance than meta-analysis for "separate statistics" and "averaging is often less powerful than directly computing statistics from aggregated data" (20). We can always significantly find more differentially expressed genes in this process than meta-analysis (20). There is more than a dozen methods that can be used to undertake cross-platform normalization. But most of these cross-platform normalization methods can only process two different platforms, and transcriptomic data have comparable sample sizes. Recently a new method Shambhala (https://github.com/oncobox-admin/ harmony) was found to solve this problem and may be the best choice to process gigantic transcriptomic datasets (21). Shambhala performs cross-platform normalization by using an auxiliary calibration dataset (P0) and a reference definitive dataset (Q). The initial data can be output into a generic form of a gene expression profile. This method can make experimental data independent of the experimental platform and the number of experiments (21). It can improve data comparability and reduce batch effect greatly (21). Above all, it is currently the only platform-independent data coordination technology that supports the processing of data obtained from multiple experimental platforms (21). Application of this technique may be the best choice to analysis all human existing glomerulus transcriptomics datasets.
Our objective is to comprehensively analyze transcriptomic profiles of all existing DN patient glomerular datasets in the GEO database for understanding of the pathogenesis of DN. In this study, we firstly downloaded all DN patients and their control glomerular original transcriptomic data. Then we used Bioconductor packages to extract the transcriptomic profiles and perform cross-platform normalization (Shambhala method), a static tests screening, and identification of differentially expressed genes (DEGs). We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) to enrich the Gene Ontology (GO) DEGs and the Kyoto Encyclopedia of Genes Genomes (KEGG) pathway. We constructed a protein-protein interaction (PPI) network and modules to screen core genes. We further used CIBERSORTx to explore the infiltration of immune cells in the DN glomerulus.

Dataset Selection
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih. gov/geo/) is an international public repository containing highthroughput microarray and next-generation sequence functional genomic datasets, which can provide researchers with a large number of gene expression profile data (22,23). At present, a large number of microarray data of different diseases have been collected in the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) database for sharing and learning by institutes around the world. In this report, in order to obtain transcripts related to human DN, we searched all GEO datasets in the NCBI database with details of "diabetic nephropathy [Mesh], " "glomerular [Mesh], " and "Homo sapiens [porgn:__txid9606]" before December 2020. A total of 21 datasets were retrieved. We defined the following exclusion criteria: (i) cell line sample; (ii) a sample of biological fluids, including blood, plasma, and urine; (iii) samples of tissues that have undergone special interventions, such as drug stimulation, hypoxia, and oxygen-enriched treatment, etc. After having filtered other tissues or diseases out, GSE96804 (24), GSE104948 (13), GSE99339 (25), GSE30528 (9), GSE21785 (26), and GSE47183 (15,27) were selected for subsequent analysis. As there are fewer control samples in these datasets, we manually retrieved human glomerular datasets as control. Datasets GSE20602 (28), GSE121233 (29), GSE108109 (13), GSE104066 (30), and GSE32591 (31) were included. These dataset have similar characteristics as the control in the DN datasets, which were marked as "glomeruli from living human donor kidney biopsy" and "glomeruli from the unaffected portion of tumor nephrectomies." Based on all the above datasets, a total of 90 DN and 95 healthy control glomerular samples were included in this study (see Table 1 for details). The samples were collected from multiple platforms, including Affymetrix GeneChip, Human Genome HG-U133A Custom CDF, and the Affymetrix Human Gene 2.1 ST Array.

Data Preprocessing and Identification of DEGs
For the preprocessing of a large number of multi-platform microarray datasets, we first used Affy1.64.0 (http:// bioconductor.org/packages/release/bioc/html/affy.html) (32) and Oligo 1.50.0 (http://bioconductor.org/packages/release/ bioc/html/oligo.html) (33) from Bioconductor in R (3.60) to extract gene expression value. Briefly, after downloading all raw data ( Table 1) from the GEO repository, probe expression values were extracted by Affy according to the user guide. After reading the raw data, background correction (rma), normalization (quantiles), probe specific background correction (pmonly), and summary (liwong) were performed to obtain the probe expression value. If the raw data could not be extracted by Affy, the Oligo package was used following the user guide. After reading the raw data, further background subtraction, normalization, and summarization was performed by using rma. All probes were further annotated to genes by their own annotation data. The median of the probe expressions was calculated as the gene expression value. After merging all transcriptomics data into a large sample and removing none of the express genes, Shambhala (https:// github.com/oncobox-admin/harmony) was used to perform cross-platform normalization according to the guidance of literature. In this research, a longer P0 was used in Shambhala which was kindly provided by developers and can be found in Supplementary Table 1. This auxiliary calibration dataset contains 13,645 genes which is much longer than what is provided on the website. The levels of each gene expression difference between control and DN were compared by Student's t-test coupled with a false discovery rate (FDR) correction. In this study, genes conforming to the fold change (FC) ≥1.5 and P < 0.05 (Student's t-test adjusted by FDR) were considered as DEGs.

GO Terms and KEGG Pathway Enrichment of DEGs
GO and KEGG enrichment of the candidate genes were performed using the DAVID online tool (https://david.ncifcrf. gov) (34). GO analysis is a bioinformatics tool that presents information on the biological domain with respect to molecular function (MF), cellular components (CC), and biological processes (BP) (35). KEGG is a database that displays information of system integration gene functions (36). The enrichment significance threshold was set to P < 0.05. The visualization of GO enrichment results was conducted by using the GO plot package in the R software (37). To determine the changed tendency of pathways in DN, the Z-score was calculated in each term using the following formula: The N up and N down separately represent the number of upregulated and downregulated genes between DN and normal controls, and the count is the number of DEGs belonging to this term (37).

PPI Network Construction and Module Analysis
Cytoscape 3.8.0 was used for visualization and analysis of the complex network (38). In order to avoid the loss of the proteinprotein interaction in a single database, we integrated PPI information collected from multiple databases. We imported network of DEGs by querying the Proteomics Standard Initiative Common QUery Interface (PSICQUIC) which is integrated in Cytoscape (39,40). Four protein interaction databases were selected for analysis: (i) STRING (https://string-db.org/) (41) which integrates data from high-throughput experiments, text mining, bioinformatics prediction, and interaction databases, (ii) MINT (https://mint.bio.uniroma2.it/) (42) in which PPIs have been confirmed experimentally, (iii) IntAct (https://www. ebi.ac.uk/intact/) (43) which is directly submitted by users, and (iv) Reactome (http://www.reactome.org) (44) which is a pathway database that provides intuitive bioinformatics tools for the visualization, interpretation, and analysis of pathway knowledge. The former three databases focus on exploring the physical interactions between proteins and the last one focuses on biological pathways. After excluding non-human gene information, the analysis results were merged to obtain more comprehensive protein-protein interaction information. In a PPI network, degree and betweenness centrality (BC) are commonly used to evaluate the critical degree of nodes. Degree is the basic index of a node, which is used to indicate the number of links that interact with the node and the network (45). BC measures the importance of nodes based on the shortest paths, which represent the shortest distance between two nodes. A node with a greater BC value has a higher frequency of information exchange within the node (46,47). In this study, nodes with a high degree and high BC were regarded as key nodes, and genes whose BC and degree were both in the top 10% in the total nodes of the network were regarded as important genes. In this study, we computed the properties of nodes and measured the default parameters with Cytoscape. Next, we used the Cytoscape plug-in Molecular Complex Detection tool (MCODE; version 1.5.1) (48) to identify the most important module in the network map. The criteria for MCODE analysis were a degree of cut-off = 2, MCODE scores >6, maximum depth = 100, node score cut-off = 0.2, and k-score = 2(48).

Core Gene Identification
We selected DEGs that met the following three constraints as core genes: (i) DEGs that had a large fold change (top 100); (ii) the gene was located in key module; and (iii) nodes with top 10% BC values and degree determined by Cytoscape software.

Immune Cells Infiltration in DN Glomerular Tissue
CIBERSORTx (https://cibersortx.stanford.edu) (49) is a digital cytometry program that uses a machine learning method. It can provide an estimation of the abundances of member cell types in a mixed cell population by using gene expression data. We used a validated leukocyte gene signature matrix that contained 547 genes to distinguish 22 human hematopoietic cell phenotypes to identify glomerular immune cells infiltration. Seven T cell types, naive and memory B cells, plasma cells, natural killer (NK) cells, and myeloid subsets infiltration alteration were identified (50). After uploading the cross-platform normalized data to CIBERSORTx, permutations were set at 100 and absolute mode was selected. Absolute mode scales relative cellular fractions into a score that reflects the absolute proportion of each cell type in a mixture. When the p < 0.05, it indicates that the infiltration rate of the 22 immune cells types analyzed by CIBERSORTx is accurate. The accurately identified immune cell infiltration was further compared between normal control and DN by Wilcoxon signed-rank test. The steps of the whole process are shown in Figure 1.

Identification of DEGs
All microarray datasets were standardized, and the results before and after standardization are shown in Figures 2A,B.
According to values of p < 0.05 and FC ≥ 1.5, a total of 578 genes were identified to be differentially expressed in the DN group, including 334 upregulated and 244 downregulated genes ( Figure 2C and Supplementary Table 2). DEGs with the top 100-fold change are shown in the heatmap (Figure 2D).

GO and KEGG Pathway Analysis
GO analysis was performed based on the 578 DEGs, and circle graphs show the top 10 entries of each term. BP demonstrated that the DEGs were enriched in lipopolysaccharide, extracellular matrix organization, angiogenesis, inflammatory response, leukocyte migration, and platelet degranulation, etc. (Figure 3A).
Variations in DEGs linked with CC were extracellular exosome, extracellular matrix, focal adhesion, and platelet alpha granule lumen, etc. (Figure 3B). Regarding MF, DEGs were significantly enriched in heparin binding, integrin binding, collagen binding, and extracellular matrix structural constituent, etc. (Figure 3C). Analysis of KEGG pathways indicated that canonical pathways associated with DEGs were complement and coagulation cascades, staphylococcus aureus infection, and ECM-receptor interaction, etc. The top 15 KEGG enrichment results are shown FIGURE 1 | The workflow of microarray data preprocessing and subsequent analysis in this study. We selected 11 datasets based on the constraints. Firstly, raw expression data were extracted after background correction and quality control. Processing was carried out using the R package, and then the data were cross-platform normalized by coordination and transformation using Shambhala. Then DEGs of DN glomerular tissue and healthy control tissue were identified by static analysis. Enrichment results were obtained using DAVID, then PPI networks and modules were constructed, and core genes were identified. Finally, the dataset was brought into the CIBERSORTx web portal to evaluate immune cell infiltration.
in Figure 3D. The complete enrichment analysis results are in Supplementary Table 3.

PPI Network Construction and Module Screening
The DEG expression products in DN were constructed into PPI networks by merging the STRING, MINT, IntAct, and Reactome databases in Cytoscape software. By removing the separated and separately connected nodes, a complex network of DEGs was constructed and is presented in Figure 4A. Three modules were identified by MCODE (Figures 4B-D). molecule (CD44), lysozyme (LYZ), Fos proto-oncogene (FOS), early growth response 1 (EGR1), albumin (ALB), plasminogen (PLG), epidermal growth factor (EGF), and dual-specificity protein phosphatase-1 (DUSP1). The details of core genes are shown in Figure 5B.

Infiltration of Immune Cells in DN
Since inflammation is enriched in GO and KEGG, it will be interesting to specify which immune cells infiltrated the glomerular under DN. CIBERSORTx is a bioinformatics tool that can specifically analyze the infiltration of immune cells in tissues.  (Figure 6).

DISCUSSION
DN is a serious complication of long-term diabetes mellitus (DM) and a growing global economic burden (51). The glomerulus plays a key role in the development of DN. However, due to the complexity of etiology and ethnic differences, our understanding of the molecular mechanism in DN glomerular tissue is still incomplete. Therefore, it is urgent to explore the new molecular mechanism which may help DN treatment and diagnosis.
In this study, we used bioinformatics methods to analyze GEO transcriptomics datasets before December 2020, and attempted to explore the potential molecular mechanisms of the DN glomerulus. A total of 578 DEGs were identified in the glomerular samples between DN and normal samples, including 334 upregulated and 244 downregulated genes. Thirteen core genes were finally identified, including C3, FN1, COL1A2, LUM, THBS2, CD44, LYZ, FOS, EGR1, ALB, PLG, EGF, and DUSP1.
Among these core genes, some have been shown to play an important role in the pathogenesis of DN, C3 (52,53), ALB (54)(55)(56), EGF (57), EGR1 (58-61), COL1A2 (62), FN1 (63,64), CD44 (65,66), FOS (67), PLG (68,69), and DUSP1 (70). It is wellknown that inflammation and fibrosis play an important role in the pathogenesis of the DN glomerulus (71). Among these 13 core genes, there is little known about what roles LYZ, LUM, and THBS2 play in the development of DN. LYZ, which encodes lysozymes, is an antimicrobial agent found in human milk. It is also found in the spleen, lungs, kidneys, white blood cells, plasma, saliva, and tears. Gallo et al. found that LYZ downregulated the production and release of inflammatory mediators [such as interleukin (IL)-6] induced by late glycosylation end products in in vitro models of human proximal renal tubular epithelial cells, and prevented the recruitment of some macrophages at the inflammatory site (72). Indicating that locally expressed LYZ may take part in the pathogenesis of DN. LUM encodes members of the leucinerich small proteoglycan (SLRP) family, which includes decorin, biglycan, and fibromodulin, etc. (73). The protein expressed by this gene partially binds collagen fibers, and highly charged hydrophilic glycosaminoglycans regulate the spacing between fibers (74). It has been reported that the LUM protein and its family member decorin accumulate strongly in the advanced glomerulosclerosis stage of DN (75). Decorin greatly affects the progression of DN by forming the ternary complex of decorintype I collagen-transforming growth factor, beta (TGF-β) (75). It is speculated that LUM may also promote the development of DN by interacting with TGF-β. THBS2 proteins belong to the thrombospondin family. As a relatively special member of this family, it has an anti-angiogenic effect and interacts with various cell receptors and growth factors to regulate cell proliferation, apoptosis, and adhesion (76). It has been shown that THBS2 plays an important role in acute kidney injury (AKI) (77). It also has been found differently expressed in the plasma of type 2 diabetes patients (78). This indicates that THBS2 plays an important role in DN.
We enriched the GO terms and KEGG pathway of DEGs. Among the enriched pathways, inflammatory response, leukocyte migration, platelet degranulation, and platelet alpha particles attracted our attention. We further used CIBERSORTx to identify immune cell infiltration in DN glomerular tissues. Three types of T cells increased infiltration in the DN tissues, including naive CD4+ T cells, regulatory T cells, and γδT cells. The originally resting NK cells in the tissues were activated, and the macrophages were also differentiated from resting M0 into the classically activated and pro-inflammatory M1 and the alternatively activated M2. The active DCs were reduced to resting. The infiltration of mast cells was increased and the infiltration of plasma cells and neutrophils was decreased in DN glomerular tissue. These results imply that humoral immunity and cellular immunity is altered in DN patient glomerular tissue. These results imply that there may be crosstalk among the glomerulus, platelets, and immune cells.
Therefore, there may be positive feedback among the glomerulus, platelets, and immune cells. This vicious cycle may damage the glomerulus for a long time even after the initial high glucose damages have been removed. This may be a reason why renal damage in DN patients still progresses even after blood glucose was strictly controlled. The hypothesized crosstalk among platelets, immune cells, and glomerular cells are shown in the schema (Figure 7).

CONCLUSION
In summary, we found three core genes that may be associated with the pathogenesis of DN (LYZ, LUM, and THBS2). Furthermore, our further bioinformatics analysis suggested that there might be positive feedback among platelets, immune cells, and the glomerulus. And this feedback may damage the glomerulus for a long time even after the initial high glucose damages have been removed. These findings may provide new ideas for the pathogenesis and treatment of DN. However, due to the lack of experimental verification in this study, further studies are needed.

AUTHOR CONTRIBUTIONS
ZL, HS, and HZ conceived and designed the study. XY, HS, FC, HH, BL, XZ, HZ, and ZL performed bioinformatics and correlation analyses. XY wrote the first draft. ZL and HS revised the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the Natural Science Foundation of Hebei Province (H2020209243) and the National Natural Science Foundation of China (81773958 and 81873140).

ACKNOWLEDGMENTS
We thank Nicolas Borisov for his gift of the long P0 datasets.