Identification CCL2,CXCR2,S100A9 of the immune-related gene markers and immune infiltration characteristics of inflammatory bowel disease and heart failure via bioinformatics analysis and machine learning

Background Recently, heart failure (HF) and inflammatory bowel disease (IBD) have been considered to be related diseases with increasing incidence rates; both diseases are related to immunity. This study aims to analyze and identify immune-related gene (IRG) markers of HF and IBD through bioinformatics and machine learning (ML) methods and to explore their immune infiltration characteristics. Methods This study used gene expressiondata (GSE120895, GSE21610, GSE4183) from the Gene Expression Omnibus (GEO) database to screen differentially expressed genes (DEGs) and compare them with IRGs from the ImmPort database to obtain differentially expressed immune-related genes (DIRGs). Functional enrichment analysis of IRGs was performed using Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). Subsequently, three machine models and protein–protein interactions (PPIs) were established to identify diagnostic biomarkers. The receiver operating characteristic (ROC) curves were applied to evaluate the diagnostic value of the candidate biomarkersin the validation set (GSE1145, GSE36807) and obtain their correlations with immune cells through the Spearman algorithm. Finally, the CIBERSORT algorithm was used to evaluate the immune cell infiltration of the two diseases. Results Thirty-four DIRGs were screened and GO and KEGG analysis results showed that these genes are mainly related to inflammatory and immune responses. CCL2, CXCR2 and S100A9 were identified as biomarkers.The immune correlation results indicated in both diseases that CCL2 is positively correlated with mast cell activation, CXCR2 is positively correlated with neutrophils and S100A9 is positively correlated with neutrophils and mast cell activation. Analysis of immune characteristics showed that macrophages M2, macrophages M0 and neutrophils were present in both diseases. Conclusions CCL2, CXCR2 and S100A9 are promising biomarkers that will become potential immunogenetic biomarkers for diagnosing comorbidities of HF and IBD. macrophages M2, macrophages M0, neutrophil-mediated inflammation and immune regulation play important roles in the development of HF and IBD and may become diagnostic and therapeutic targets.


Introduction
Heart failure (HF) is defined as any structural and/or functional failure of cardiac ejection, leading to complex clinical syndromes with typical symptoms and clinical signs (1), and it is the terminal stage of various cardiovascular diseases.HF has a high incidence rate and high mortality, and it leads to poor function and quality of life and high cost.HF affects over 64 million people worldwide (2).In view of the great burden of chronic HF to society, we need a deeper understanding of its pathophysiological mechanism, not only in terms of mortality but also in terms of the incidence rate related to repeated and longterm hospitalization, which merits accelerated research (3).The occurrence of HF is associated with cardiovascular aging (4), and risk factors include old age, hypertension, diabetes, dyslipidemia, obesity (5), volume overload, fluid congestion, hereditary cardiomyopathy, etc (6).Based on the ongoing exploration of the pathophysiological mechanisms of HF and the publication of clinical evidence-based research, inflammation and immunity, including autoimmune and infection-mediated mechanisms, are now considered other possible mechanisms of HF (7)(8)(9).
Inflammatory bowel disease (IBD) is a common and complex group of autoimmune diseases (10), including ulcerative colitis and Crohn's disease which are chronic diseases of the gastrointestinal tract (11).In recent years, the incidence rate and prevalence of IBD have risen sharply in developed and developing countries (12), resulting in a significant global disease burden (13).The etiology and pathogenesis of IBD are still uncertain, and it may be related to genetic factors, environmental factors, and intestinal barrier dysfunction, as well as ecological imbalance of the gut microbiota and exacerbation of innate and adaptive immune functions (14,15).IBD is a multisystem disease that primarily affects the gastrointestinal, musculoskeletal, eye, and skin systems, as well as the cardiovascular system (16).
The most recent research shows that IBD is related to cardiovascular disease, and HF events in IBD patients affect the disease process and prognosis (17).Patients with chronic inflammatory diseases (including IBD) are more likely to suffer from atherosclerotic cardiovascular disease, HF and atrial fibrillation (18).The interaction between IBD and HF is complex, and pathological and physiological changes mediated by immune factors are common features between the two diseases.
Both diseases are related to the interaction and imbalance of immune cells (19,20).Immune cells play a crucial role in the occurrence and development of both diseases.Therefore, it is necessary to identify common immune biomarkers for both diseases, evaluate immune cell infiltration in both diseases and determine changes in immune cell composition to elucidate the molecular mechanisms of HF and IBD development and develop new immunotherapy targets.
Bioinformatics is developing a bridge between computer science and medicine, and machine learning (ML) is a field of computer science that continuously improves the performance of learning tasks by exploring patterns in data and applying selfimprovement.It involves the use of computers to simulate human learning (21).The application of bioinformatics and ML methods in the medical field can better help us understand the pathophysiological mechanisms of diseases, screen for diseasespecific biomarkers and gain a deeper understanding of diseases.Therefore, this study uses a combination of bioinformatics and ML methods, along with the construction of PPI networks, to screen for immune-related gene markers common to both diseases, analyze the immune cell infiltration of HF and IBD and explore the immune-related mechanisms and targets of HF and IBD.The results may provide a new approach for the diagnosis and treatment of HF and IBD.

Data downloading and processing
The chips GSE120895 and GSE21610 for HF and GSE4183 for IBD were downloaded as experimental sets, and the HF chip GSE1145 and IBD GSE36807 datasets were downloaded as validation sets from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/).Immune gene data were downloaded through the ImmPort database (https://www.immport.org/).R 4.3.0software was applied to screen the differentially expressed genes (DEGs) of HF samples and normal samples on chips GSE120895 and GSE21610 and enteric disease samples and normal samples on GSE4183.The screening conditions were Abs log 2 Fold Change >1 and correction P < 0.05.The overlapping DEGs of the two diseases were introduced into DAVID (https://david.ncifcrf.gov)for Gene Ontology(GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis.The specific process is shown in Figure 1.

Immune-related gene acquisition and functional correlation analysis
After clarifying that the overlapping genes of the two diseases were closely related to inflammation and immunity, the DEGs obtained from the HF and IBD chips and immune gene were intersectedto obtain immune-related genes for the two diseases.GO analysis and KEGG pathway analysis of immune-related genes were carried out to explore the biological functions and signaling pathways involved in immune-related genes.

Screening of diagnostic markers
Four methods are used for feature marker selection, including LASSO regression, random forest (RF), SVM-RFE and the MCC algorithm of the cytoHubba plug-in.The LASSO regression model is a multigene-based classifier (22) implemented through R's glmnet software package, which is a high-dimensional variable regression analysis algorithm based on linear regression.
The RF algorithm is an Ensemble learning algorithm based on decision trees.Each decision tree is randomly extracted from different samples and features, and then the results of these decision trees obtain the final prediction results by voting (23).The RF of R was used to sequence the immune-related genes of IBD and HF.By cross-validation with ten times the standard deviation of error t, the feature with the smallest error is selected, and the selected genes are identified as feature genes.The R package e1071 was used to construct the SVM classifier.Through tenfold cross validation, the SVM RFE algorithm was used in feature selection (24), and STRING 11.5 (https://cn.string-db.org/) was used to plot protein-protein interaction (PPI) networks.Then, the MCC algorithm of the cytoHubba plugin was used.Finally, overlapping genes were selected from the above four classification models for further analysis.

Verification of diagnostic markers
The GSE1145 dataset of HF and GSE36807 dataset of IBD were used as the validation set.The diagnostic validity of the selected characteristic genes was evaluated with the ROCR package of R and MedCalc software using receiver operating characteristic (ROC) curves.The ROC curve and the area under the curve The study procedure.(AUC) of the subjects were used to evaluate the diagnostic efficacy.Statistical analysis was conducted using GraphPad Prism 9.5.1 software.Nonpaired sample t tests were performed on the expression levels of positive and normal samples, with a difference of P < 0.05 being statistically significant.The results were visualized in the form of box plots.Finally, the differences in the validation set were statistically significant; and, combined with the key genes of the ROC curve as the screening results of this study, it is believed that these may be immune-related diagnostic genes for HF and IBD.

Evaluation and correlation analysis of infiltration-related immune cells
The HF chip datasets GSE21610 and GSE120895, IBD chip GSE4183, and CIBERSORT's LM22 immune cell dataset were used to study immune cell infiltration.This study used the CIBERSORT program of R to obtain immune cell infiltration data based on a cutoff of P < 0.05.Then, R's ggplot2 package was used to draw a box plot of immune cell infiltration comparing positive and normal samples of the two diseases to compare the differences in immune cells between healthy individuals and patients.The Spearman relationship between immune-related biomarkers and 22 kinds of infiltrating immune cell-related heatmaps was drawn by using the "ggcorrplot" software package.

Chip data acquisition
The experimental set of HF chip GSE21610 includes 60 HF samples and 8 normal samples; GSE120895 includes 47 HF samples of dilated cardiomyopathy and 8 normal samples.The IBD chip GSE4183 includes 15 IBD samples and 8 normal samples.After merging the HF chips GSE120895 and GSE21610 through R4.3.0 and removing batch effects, a total of 1,332 DEGs were obtained through comparative analysis with normal samples, including 924 upregulated genes and 408 downregulated genes compared to controls (Figure 2A).A total of 1,640 DEGs were obtained from IBD, including 1,059 upregulated genes and 581 downregulated genes compared to controls (Figure 2B).A total of 2,498 immune genes were obtained from the ImmPort databaseA total of 144 overlapping genes were obtained from the comparison of DEGs from HF and IBD (Figure 2C), and the overlapping genes were processed into GO and KEGG.The GO analysis (Figure 2D) showed that the overlapping genes were mainly enriched in the immune response, inflammatory response, chemokine activity, signal transduction, etc.Consistent with the results of GO analysis, KEGG analysis (Figure 2E) was significantly enriched in cytokine-cytokine receptor interaction and signal transduction-related pathways.In addition, the chemokine signaling pathway and tumor necrosis factor signaling pathway were closely related to both diseases, indicating that both diseases are related to inflammation and immunity.

Acquisition of immune-related genes and GO and KEGG analysis
By comparing the overlapping genes of HF and IBD with immune genes, 36 immune-related genes were obtained.BMP4 and SERPINA3 were deleted after screening in three samples, and 34 immune-related genes were ultimately obtained (Table 1).GO and KEGG analyses of immune-related genes.The GO analysis results of immune-related genes (Figure 3A) were mainly enriched in extracellular space, neutrophil chemotaxis, inflammatory response, extracellular region, antimicrobial humoral immune response mediated by antimicrobial peptide, chemokine-mediated signaling pathway, immune response, chemokine activity, chemotaxis, killing of cells of other organism signal transduction, and cellular response to lipopolysaccharide.The KEGG analysis results (Figure 3B) were mainly enriched in viral protein interaction with cytokine and cytokine receptor, cytokine-cytokine receptor interaction, IL-17 signaling pathway, chemokine signaling pathway and NOD-like receptor signaling pathway.

Verification of characteristic markers
We further evaluated the diagnostic values of CCL2, CDH1, CXCR2, PPBP and S100A9 through ROC curves (Figures 5A,B).CCL2, CXCR2, and S100A9 had high accuracy in HF (GSE1145) and IBD (GSE36807), AUC > 0.7.Statistical analysis was conducted using GraphPad Prism 9.5.1 software.Nonpaired sample t tests were performed on the expression levels of positive and normal samples, with a difference of P < 0.05 being statistically significant.These results are shown in (Figures 5C,D).Finally, the differences in the validation set were statistically significant.Combined with the key genes of the ROC curve as the screening results of this study, CCL2, CXCR2, and S100A9 were screened out.S100A9 and CXCR2 are upregulated in both diseases, while CCL2 is downregulated in HF and upregulated in IBD.It is believed that these key genes can be used as immune diagnostic markers for HF and IBD.

Immune cell infiltration
The histogram shows the composition of 22 immune cells in the HF-and IBD-positive samples (Figures 6A,C).The colors of Then, the Wilcoxon test was used to determine significant differences in immune cell infiltration between the HF group and the control group, as well as between the IBD group and the control group, with P < 0.05 indicating a significant difference (Figures 6B,D).There were significant differences in 19 types of immune cells between the HF group and the control group.The HF group had significantly higher levels of gamma delta T-cells, neutrophils, resting memory CD4 T-cells, and M0 macrophages, and the control group had significantly higher levels of M2 macrophages, resting mast cells, and CD8 T-cells.There were significant differences in 18 types of immune cell infiltration between the IBD group and the control group.Compared with the control group, IBD patients had higher levels of activated memory CD4 T-cells, follicular helper T-cells, neutrophils, M0 macrophages, and M1 macrophages.In the control group, M2 macrophages, dendritic cell resetting, CD4 memory T-cell resetting, and plasma cells were significantly increased, and the overlapping results showed that neutrophils and M0 macrophages were higher in both diseases, while M2 macrophages were lower in both diseases.

Discussion
HF is closely related to persistent chronic low-grade aseptic inflammation of the whole body and locally in the heart, as well as microvascular injury characterized by endothelial dysfunction, oxidative stress, myocardial remodeling, and fibrosis (25).IBD is characterized by recurrent episodes of gastrointestinal inflammation caused by an abnormal immune response to intestinal microflora (26).The relationship between the two diseases is still unclear, but immunity and inflammation play an important role in HF and in enteritis.Therefore, exploring the similarities in the infiltration of immune cells between the two diseases and investigation of immune-related biomarkers overlapping between the two diseases will undoubtedly have important clinical significance for the early recognition and intervention of these diseases.The DEGs and immune genes of the two diseases were intersected to obtain overlapping immune-related genes, and the immune-related genes were analyzed by GO and KEGG.The GO analysis results were mainly related to neutrophil chemotaxis, inflammatory response, cathelicidin-mediated humoral immunity response, immune response, chemokine activity, and chemotaxis.KEGG analysis results were related to the interaction of viral protein with cytokines and cytokine receptor, cytokine-cytokine receptor interaction, IL-17 signaling pathway, chemokine signaling pathway, NOD-like receptor signaling pathway, rheumatoid arthritis, hematopoietic cell lineage, amebic disease, tumor necrosis factor signaling pathway, neuroactive ligand receptor interaction, and influenza A. Inflammatory cytokines such as tumor necrosis factor and various chemokines may stimulate the recruitment of proinflammatory white blood cells and promote myocardial injury, fibrosis remodeling and dysfunction, and promote the occurrence and development of IBD (19,27).Cathelicidins (AMPs) are low-molecular weight proteins with broad-spectrum antibacterial and immunomodulatory activities that can fight against infectious bacteria (28) and play key roles in maintaining tolerance to intestinal microbiota and preventing intestinal infection (29).Gen Long Xue et al. speculate that IL-17 signaling bypasses NF-κB to inhibit the expression of SERCA2a and Cav1.2 in HF, thereby damaging the functional contraction and structural remodeling of myocardial cells and thus participating in the development of HF (30).NOD-like receptor signaling is related to myocardial remodeling and cardiomyocyte hypertrophy in HF (31,32).Overexpression of IL-17 in IBD plays a key role in the disease mechanism (33), and IL-17 has become a therapeutic target for multiple sclerosis, psoriasis, psoriatic arthritis, ankylosing spondylitis, rheumatoid arthritis, and IBD (34,35).
By combining bioinformatics analysis and machine learning, CCL2, CXCR2, S100A9, PPBP, and CDH1 were screened.After validation, CCL2, CXCR2, and S100A9 were used as immunerelated gene markers for the two diseases.CCL2, also known as monocyte chemoattractant protein-1 (MCP-1), comes from the CC chemokine family (36) and is a small molecule (5-20 kDa) basic heparin binding protein with 20%-70% homology in the amino acid sequence, characterized by the conserved position of four cysteine residues (37).CCL2 is produced by many cell types, including endothelial cells, fibroblasts, epithelial cells, smooth muscle cells, mesangial cells, astrocytes, monocytes and microglia (38,39).CCL2 can promote cardiac decompensation (40), and upregulation in stressed and injured cardiomyocytes (41).TFRC in cardiomyocytes recruits and activates macrophages by secreting CCL2 to induce myocardial hypertrophy and promote HF development (42).Daniela Impellizzeri et al. found that proinflammatory mediator CCL2 was significantly upregulated in inflammation of the colon and is associated with increased disease activity (43).CXCR2 is a chemokine receptor classified as a G protein-coupled receptor (GPCR) belonging to the CXCR  46).In IBD, Blocking CXCR2 signaling could be a potential therapeutic target for the prevention of IBD (48).CXCR2 has become a therapeutic target for many inflammatory diseases, such as chronic obstructive pulmonary disease (COPD), allergic asthma, gram-negative sepsis, IBD, lung injury, auto-immune diseases and cancer (45,49).S100A9, also known as myeloid-associated protein 14 or calpain B, belongs to the calcium-binding protein S100 family and is mainly expressed in neutrophils.In addition, under the influence of various chronic inflammatory factors, it can also be expressed on vascular endothelial cells and mature macrophages (50).S100A9 has a pleiotropic effect on the cardiovascular system.Xuan Wei et al. found that hypertrophic myocardial precursors can reduce myocardial fibrosis and cardiac muscle cell hypertrophy by upregulating S100A9 (51).MARINKOVIC G et al. found that short-term S100A9 blockade improves cardiac function after permanent myocardial ischemia in mice (52).The study found that S100A8 and S100A9 mRNA were differentially expressed in blood leukocytes of patients with IBD compared to healthy controls and in active compared to quiescent disease.S100A9 could therefore be used as a potential biomarker of IBD (53).The above findings further confirm the great potential of CCL2, CXCR2 and S100A9 as immune-related biomarkers of these two diseases.
CIBERSORT was applied to evaluate the immune infiltration process in HF and IBD.T cell CD4 memory resting, macrophages M0, macrophages M2, neutrophils are present in both diseases, with levels of neutrophils and Macrophages M0 being higher in both diseases and levels of Macrophages M2 being lower in both diseases.These immune cells may be related to the occurrence and development of both diseases.Neutrophils are the main phagocytes in the circulating blood and reach the inflammatory site in a cascade-like manner, leading to activation of specific effector functions (54).The increase in neutrophils and sustained activation of neutrophils are the main factors determining the overactivated inflammation of acute HF and the long-term outcomes of chronic HF (55).Macrophages are the main immune cell group in the resting heart tissue and are present around interstitial and endothelial cells (56).The M2 phenotype is involved in tissue repair and immune tolerance and play a role in maintaining organs and soft tissues and regulating the immune balance (57), exhibits cardioprotective effects in HF (58).In the gastrointestinal tract, macrophages are considered to play an important role in maintaining the stability of the internal intestinal environment and are also the key sentries of the intestinal immune system (59).There is evidence to suggest a causal relationship between the deficiency of intestinal inflammation regression and changes in monocyte macrophage differentiation in IBD patients (60).
Regarding correlation between immune-related biomarkers and immune cells, CCL2 was positively correlated with mast cells activated and negatively correlated with mast cells resting in both HF and IBD.CXCR2 is positively correlated with neutrophils, and S100A9 is positively correlated with neutrophils and mast cells activated and negatively correlated with Dendritic cells resetting and Mast cells resetting.In the intestinal mucosa, CCL2 attracts mast cells, and MC activation on the mucosal surface mediates a severe inflammatory response (61).In HF, mast cells are activated to induce the production of CCL2 (62).CXCR2 and S100A9 are neutrophil-related biomarkers and potential therapeutic targets in ulcerative colitis (63).AVERILL MM et al. found that S100A9 can change the phenotypic status of neutrophils, macrophages and dendritic cells to varying degrees and is related to inflammation, IBD, obesity, and cardiovascular disease (48,64,65).These results reveal the interaction between genes and immune cells.
This paper used machine learning and bioinformatics methods to explore the immune-related mechanisms, immune infiltrating cells and immune-related biomarkers of HF and IBD.In addition to researching the mechanism of immune-related genes using GO and KEGG, we used RF, Lasso regression, and the SVM-RFE algorithm to identify characteristic genes, and CIBERSORT was used to complete immune cell infiltration for the two diseases.However, this study has many shortcomings.First, there is a lack of validation of large-sample clinical and basic trial results.Second, there are too few high-quality articles on the relationship between enteric diseases and HF.

Conclusion
This study screened C-C motif chemokine 2(CCL2), C-X-C chemokine receptor type 2(CXCR2), and Protein S100-A9 (S100A9) as immune-related diagnostic markers for HF and IBD.Neutrophils, M0 macrophages, and M1 macrophages may be involved in the occurrence and development of both diseases.In Frontiers in Cardiovascular Medicine 10 frontiersin.orgaddition, CCL2 is positively correlated with mast cells activated in both HF and IBD, CXCR2 is positively correlated with neutrophils, and S100A9 is positively correlated with neutrophils and mast cells activated.In summary, these immune cells and immune-related diagnostic markers may have a significant impact on the development of HF and IBD.The study of immune cells between IBD and HF and the identification of relevant diagnostic markers may determine the targets of immunotherapy, providing ideas for the immunomodulatory treatment of these two diseases in clinical practice.

FIGURE 2
FIGURE 2 Volcano diagram and cross gene KEGG and GO diagram.(A) Volcano plot of HF.Blue dots represent downregulated genes, black dots represent nonsignificant genes, and red dots represent upregulated genes.(B) Volcano plot of IBD.Blue dots represent downregulated genes, black dots represent nonsignificant genes, and red dots represent upregulated genes.(C) Venn diagram of the overlap of HF DEGs and identical IBD DEGs that can be eliminated.(D) GO analysis of the functional enrichment of overlapping genes of HF and IBD.The dot size reflects the number of enriched genes, and the color indicates the significance of enrichment.(E) KEGG pathways of overlapping genes of HF and IBD.The dot size reflects the number of enriched genes, and the color indicates the significance of enrichment.

FIGURE 3 KEGG
FIGURE 3KEGG and GO diagrams of immune related genes.(A) GO analysis of the functional enrichment of immune-related genes.The dot size reflects the number of enriched genes, and the color indicates the significance of enrichment.(B) KEGG pathways of immune-related genes.The dot size reflects the number of enriched genes, and the color indicates the significance of enrichment.

FIGURE 4
FIGURE 4 Machine model diagram and immune-related gene network.(A) Least absolute shrinkage and selection operator (LASSO) logistic regression algorithm to screen diagnostic markers.(B) Different colors represent different genes.(C) Cross-fold validation error t standard deviation in the random forest model.The red dashed line represents the number of genes with the smallest error.(D) Hub genes selected from the random forest model.The length of the bar chart represents the importance of the genes.(E) ROC curve of random forest.(F) Minimum root mean square error graph for feature gene selection in SVM.(G) The hub genes selected by SVM; the height of the bar graph represents the importance of the genes.(H) Top 10 hub genes of the immunerelated gene network.(I) Overlapping genes of four algorithms.

FIGURE 6
FIGURE 6 Immune cell infiltration.(A) Evaluation of infiltration of 22 types of immune cells in HF samples.(B) Violin diagram of the proportions of 22 types of immune cells in HF. (C) Evaluation of 22 types of immune cell infiltration in IBD samples.(D) Violin plot of the proportions of 22 types of immune cells in IBD.

FIGURE 7
FIGURE 7 Correlation between diagnostic markers and infiltrating immune cells.(A) Correlation between CCL2 and infiltrating immune cells in HF. (B) Correlation between CXCR2 and infiltrating immune cells in HF. (C) Correlation between S100A9 and infiltrating immune cells in HF. (D) Correlation between CCL2 and infiltrating immune cells in IBD.(E) Correlation between CXCR2 and infiltrating immune cells in IBD.(F) Correlation between S100A9 and infiltrating immune cells in IBD.The size of the dots represents the strength of the correlation between genes and immune cells; the larger the dots, the stronger the correlation, and the smaller the dots, the weaker the correlation.The color of the dots represents the P value; the greener the color, the lower the P value; and the redder the color, the larger the P value.P < 0.05 was considered statistically significant.

TABLE 1
Information on immune-related genes.