Machine Learning Reveals Ets2 as a Novel Target for Membranous Nephropathy Treatment and Its Role in Immune Infiltration

Background Membranous nephropathy (MN) is a common pathological phenotype for adult nephrotic syndrome (NS). The occurrence of MN is increasing across China, but diagnostic methods for MN still rely on kidney biopsy and PLA2R and THSD7A detection in plasma and kidney tissue, and there has been no new biomarker for MN discovered since 2014. Immune infiltration status in MN patients suffers from the dearth of associated studies. In the present study, we aimed to find new bio-markers for MN and evaluate the role of immune cells infiltration in MN pathology. Methods We downloaded MN expression profile from the Gene Expression Omnibus database and used R-project to screen differentially expressed genes (DEGs) and performed functional correlation analysis. Least absolute shrinkage and selection operator (LASSO) logistic regression and Radom Forest algorithms were used to screen and verify the bio-markers of MN. Finally, CIBERSORT was used to evaluate the infiltration of immune cells in MN tissues. Results A total of 463 DEGs were screened from the MN tissue in this study. ETS2 was identified as bio-marker for MN. The CIBERSORT results showed that there were statistical differences in monocytes, plasma cells, regulatory T cells, and memory B cells. In addition, ETS2 was positively related to monocytes, M1 phase macrophages, and neutrophils and negatively correlated to plasma cells, CD4+ T memory cells, M2 macrophages, CD8+ T cells, memory B cells, and resting mast cells. Conclusion (1) Machine learning algorithms reveals Ets2 as a novel target for membranous nephropathy patients. (2) Immune infiltration plays an important part in membranous nephropathy. (3) Ets2 expression is related to immune cells infiltration.


INTRODUCTION
Membranous nephropathy (MN) is one of the common causes of nephrotic syndrome (NS) in the adult population (1). Recent studies confirmed the occurrence of MN in primary glomerulopathy (2). The typical pathological changes for MN include diffuse thickening of the glomerular capillary basement membrane and diffuse granular deposits of IgG with or without C3 (3). Currently, the diagnosis of MN relies on kidney biopsy and detection of PLA2R (4) and THSD7A (5) in plasma and kidney biopsy tissue: there has been no new biomarker for MN discovered since 2014 (6). Finding new biomarkers for MN diagnosis and treatment target is important.
Based on previous studies on human MN patient kidney biopsy and animal models, MN pathophysiological changes involved B cell activation (7) and immune infiltration caused by B cell activation including macrophages (8) and T cells (9). Recently, Rituximab, a monoclonal antibody against CD20, which can cause B cell degeneration, had been approved as a first-line treatment for MN patients (10), indicating that immune infiltration, especially B cell action, plays an important role in MN activation. However, there has been no large-scale immune infiltration study on human MN patients based on RNA sequencing and the CIBERSORT algorithm.
Machine learning algorithms such as least absolute shrinkage and selection operator logistic regression (LASSO) and random forest algorithms were widely used in clinical practice. These machine learning algorithms provided new insight in COVID-19 mortality (11), progressive of idiopathic pulmonary fibrosis (12) and cancer outcomes (13). Based on these studies, we could use machine learning algorithms to find novel targets for MN patients.
In the present study, a microarray dataset of MN downloaded from the Gene Expression Omnibus (GEO) database was used. We performed differential expression gene analysis and used machine learning algorithms to screen and identify suitable targets for MN treatment and diagnosis. CIBERSORT (14) was then used to ascertain the difference in immune infiltration between normal tissue and MN tissue. In addition, MN treatment target and its relationship with immune infiltration in 22 immune cells were determined.

Data Download
The "GEOquery" package in R (Version 4.10) was used to download the MN expression profile datasets GSE99340 and GSE108113 from the GEO database.

Data Processing and Deg Screening
The expression matrices of GSE99340 and GSE108113 were downloaded and combined. The control group and MN group samples were selected according to patient data ( Table 1). Interbatch differences were removed using the "sva" package. Two PCA cluster plots were used to demonstrate the effect of inter-sample correction. DEGs were scanned using the "limma" package and "ggplot2" software was used to show differential expression of DEGs.

Functional Analysis
The "clusterProfiler" package was used to perform Gene Ontology (GO) and Disease Ontology (DO) enrichment analyses on DEGs, respectively. KEGG pathway enrichment analyses were also conducted on the gene expression matrix through the "clusterProfiler" package. A false discovery rate (FDR) < 0.25 and p < 0.05 were considered to represent significant enrichment.

Screening and Verification of Treatment Target
We used least absolute shrinkage and selection operator logistic regression (LASSO) and random forest algorithms to perform feature selection to screen treatment target for MN. The LASSO algorithm was applied using the "glmnet" package, and the random forest algorithm was established using the "randomForest" package the further to evaluate treatment targets. The GSE108113 matrix was used as a test matrix to examine random forest algorithm results. Then, we combined genes from the results from running the LASSO and random forest algorithms.

Evaluation of Immune Cell Infiltration
CIBERSORT R-packages and the LM22 document were used to run the algorithm locally (14); we filtered out those samples with p < 0.05 and obtained the immune cell infiltration matrix. The "corrplot" package was used to draw a correlation heatmap to visualize the correlation between infiltrated immune cells; the "ggplot2" package was employed to plot diagrams allowing visualization of the correlations in immune cell infiltration.

Correlation Analysis of Treatment Target and Immune Cell Infiltration
Spearman correlation analysis was applied to the markers and infiltrating immune cells and the "ggplur" package was used to visualize the results.

Data Processing and DEGs Screening
First, we merged expression profile datasets GSE99340 and GSE108113. The "sva" package was then used to remove inter-batch differences between different expression matrices. The merged expression matrix was normalized and processed: two PCA cluster diagrams (before and after normalization, Figures 1A,B) were used, indicating that the sample source was reliable. After data normalization, R-project was used to extract a total of 463 DEGs from the gene expression matrix, as shown in volcano map ( Figure 1C). The heatmap of the top-50 DEGs demonstrated that DEGs were expressed differently in normal and MN samples (Figure 2A).

Functional Correlation Analysis
GO analysis showed that DEGs were mainly related to biological process (BP) including inflammatory response, lymphocyte activation, response to bacterium, response to lipopolysaccharide, leukocyte proliferation, leukocyte migration, response to molecules of bacterial origin, and cellular response to biotic stimulus ( Figure 2B). The cellular component (CC) aspect of the results of GO analysis of DEGs included external encapsulating, extracellular matrix, collagen containing, external side of plasma membrane, basolateral plasma membrane, basal part of cells, and apical plasma membrane ( Figure 2C). The molecular function (MF) aspect of the GO analysis results included active transmembrane transporter activity, symporter activity, secondary active transmembrane transporter activity, active ion transmembrane transporter activity, solute cation symporter activity, solute sodium symporter activity, sodium ion transmembrane transporter activity, and cytokine binding ( Figure 2D). GO analysis showed that DEGs were mainly related to urinary system disease, kidney disease, mouth disease, lung disease, tooth disease, myeloma, bone marrow cancer, and multiple myeloma ( Figure 2E). KEGG pathway enrichment showed that human papilloma virus infection pathway, chemokine signaling pathway, pathways in cancer, Epstein-Barr virus infection pathway, proteoglycans in cancer pathway, human cytomegalovirus infection pathway, human T-cell leukemia virus 1 infection pathway, Rap 1 signaling pathway, Wnt signaling pathway, and axon guidance pathway as the top-10 most upregulated pathways (Figure 2F). These results indicating MN pathophysiological changes involved immune responses.

Screening of Biomarkers for MN
The LASSO logistic regression algorithm was employed to identify 23 genes from DEGs as biomarkers for MN combined results from two λ values (Figure 3A). The protein-protein interaction (PPI) was analyzed ( Figure 3B); 30 genes were obtained through use of the random forest algorithm as MN biomarkers ( Figure 3C) and PPI analysis ( Figure 3D); the learning status of the random forest algorithm showed that the predictive results were useful when scanning MN patients (Figure 3E). The gene markers from two algorithms were overlapped and one related gene (ETS2) was obtained ( Figure 3F).

Immune Cell Infiltration Results and Correlation Analysis Between Ets2 and Immune Cells
The CIBERSORT tool was used to identify the estimated proportion of immune cells in MN and normal tissues. The estimated proportion of immune cells for each sample is shown in Figure 4A. The immune cell composition in both MN and normal groups was calculated; results showed that there were statistical differences in monocytes, plasma cells, regulatory T cells, and memory B cells (Figure 4B). The correlation between infiltrated immune cells is shown in Figure 4C. The correlation between ETS2 and infiltered immune cells was also determined.

DISCUSSION
MN is one of the most common pathological phenotypes for adult NS (3). Treatment for MN includes Rituximab, immunosuppressants, and glucocorticoids (10). Although, some patients can self-heal, many suffer from a protracted course of MN and some would eventually progress to end-stage-kidneydisease and require a kidney transplant.
Immune infiltration plays an important role in MN development (15). B cell activation leads to deposition of immune complex causing classic pathological changes of MN. Immune complex and B cell activation also leads to immune cell infiltration by macrophages. Macrophage phenotypes also    Frontiers in Medicine | www.frontiersin.org play a critical role in the progress of MN; MN patients had a higher ratio of M1 macrophages to M2 phenotype, which led to increased inflammation and caused more damage (8).
We downloaded the expression profile dataset from the GEO database and identified a total of 463 DEGs. GO enrichment analysis showed that DEGs were mainly related to inflammatory response, lymphocyte activation, response to bacterium, response to lipopolysaccharide, leukocyte proliferation, leukocyte migration, response to molecules of bacterial origin, and cellular response to biotic stimulus. These results show that immune activity plays an important role in MN processes. KEGG enrichment analysis showed that up-regulated DEGs mainly belonged to the human papilloma virus infection pathway, chemokine signaling pathway, pathways in cancer, and Epstein-Barr virus infection pathway. These results showed that immune infiltration and immunomodulatory genes were changed during MN pathology changes. Also, targeting immune infiltration and virus-related innate immune could be next generation treatments (15) and diagnostic methods for MN (16).
CIBERSORT analysis result of 22 immune cell infiltration showed the statistical differences in monocytes, plasma cells, regulatory T cells, and memory B cells and indicated that pathways involved in immune cell activation and infiltration could be critical in MN progression. Moreover, these results demonstrated that these immune cells could be viable targets for MN immune suppression treatment.
The expression data were analyzed by using machine learning algorithms to screen possible targets for MN treatment. LASSO and random forest algorithms were used during the scan for treatment targets. We then combined the results from the LASSO algorithm and the random forest algorithm, showing that ETS2 was a suitable target for MN treatment. ETS2 was found to be positively related to monocytes, M1 phase macrophages, and neutrophils and negatively correlated to plasma cells, CD4+ T memory cells, M2 macrophages, CD8+ T cells, memory B cells, and resting mast cells. These results indicated that ETS2 participated in MN pathophysiological immune infiltration. According to previous studies, ETS2 participated in various activities: EST2 can control cytokine production and innate immune activation, which leads to IL-6 suppression and decreased amounts of macrophages (17). These functions of ETS2 could be related to its suppression of MAPK/NF-KB pathways (18). Also, deletion of ETS2 in pancreas leads to fibroblast continuous activation (17), however, studies in kidney showed that ETS2 could promote epithelialto-mesenchymal transition and cause the progression of renal fibrosis. The study also indicated that these results could be caused by direct levels of regulation between ETS2 and JUNB transcription (19). Moreover, some studies showed that ETS2 interacts with the TGF-β/Smad pathway (20)-a pathway related to renal fibrosis-but studies of ETS2 function in MN are needed.
These results demonstrated that ETS2 was a crucial component for MN activation and immune infiltration, but direct intervention with ETS2 transcription function is not a wise way in which to treat MN. Further studies should be aimed at understanding of ETS2 function in MN and interaction with ETS2 promoting JUNB transaction. Inhibiting ETS2 and JUNB promotor binding could be the next-generation treatment used to cure MN. However, there are some limitations of this study, such as lack of the validations of Ets2 in MN patients at the protein level and Ets2 expression level's relation with patients' clinical outcomes. Further studies in this area are needed.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: GEO Database; GSE99340; GSE108113.