Integrative Identification of Hub Genes Associated With Immune Cells in Atrial Fibrillation Using Weighted Gene Correlation Network Analysis

Background: Atrial fibrillation (AF) is the most common tachyarrhythmia in the clinic, leading to high morbidity and mortality. Although many studies on AF have been conducted, the molecular mechanism of AF has not been fully elucidated. This study was designed to explore the molecular mechanism of AF using integrative bioinformatics analysis and provide new insights into the pathophysiology of AF. Methods: The GSE115574 dataset was downloaded, and Cibersort was applied to estimate the relative expression of 22 kinds of immune cells. Differentially expressed genes (DEGs) were identified through the limma package in R language. Weighted gene correlation network analysis (WGCNA) was performed to cluster DEGs into different modules and explore relationships between modules and immune cell types. Functional enrichment analysis was performed on DEGs in the significant module, and hub genes were identified based on the protein-protein interaction (PPI) network. Hub genes were then verified using quantitative real-time polymerase chain reaction (qRT-PCR). Results: A total of 2,350 DEGs were identified and clustered into eleven modules using WGCNA. The magenta module with 246 genes was identified as the key module associated with M1 macrophages with the highest correlation coefficient. Three hub genes (CTSS, CSF2RB, and NCF2) were identified. The results verified using three other datasets and qRT-PCR demonstrated that the expression levels of these three genes in patients with AF were significantly higher than those in patients with SR, which were consistent with the bioinformatic analysis. Conclusion: Three novel genes identified using comprehensive bioinformatics analysis may play crucial roles in the pathophysiological mechanism in AF, which provide potential therapeutic targets and new insights into the treatment and early detection of AF.


INTRODUCTION
Atrial fibrillation, the most common sustained arrhythmias in the clinic (1), affects ∼1 to 2% of the population (2), which increases the morbidity, mortality, and medical burden worldwide (3). The risk of stroke in patients with AF is approximately five-fold higher than that in healthy people (4). AF is a complex disease (5) associated with common risk factors, including advancing age, male sex, hypertension, obesity, and diabetes (6). However, the pathophysiological mechanism leading to AF is still unclear. Increasing evidence demonstrates that immune cells play a significant role in the pathogenesis of AF (7). Several immunemediated serum inflammatory markers such as CRP and IL-6 have been confirmed to be elevated in patients with AF (8,9). Nevertheless, the association between immune cells and the biological molecular mechanism of AF still needs further research to clarify the pathogenesis of AF and find potential therapeutic targets.
Weighted gene correlation network analysis (WGCNA) is a system biology method used to cluster genes into modules according to expression patterns among different samples (10). Based on the interconnectivity of genes, WGCNA can explore the relationships between gene modules and the clinical phenotypes to identify candidate biomarkers or therapeutic targets. It has been applied to numerous kinds of diseases (11)(12)(13)(14).
In this study, we aimed to explore the association between immune cells and AF using comprehensive bioinformatics analysis. Cibersort was used to evaluate immune cell composition, and WGCNA was used to identify the hub gene module. We then analyzed the functional enrichment of the genes in the hub module. Based on the protein-protein interaction (PPI) network, hub genes were identified for further analysis and validation. We hope that this research can provide potential targets and new research ideas for the treatment of AF.

Data Acquisition and Processing
The GSE115574 dataset, including fourteen gene expression profiles from left atrial tissues of AF patients and fifteen gene expression profiles from left atrial tissues of sinus rhythm (SR) patients, were downloaded in the Gene Expression Omnibus (GEO) database. The R language was used to process the original expression profile of the GSE115574 dataset. Cibersort, a bioinformatics algorithm that could estimate 22 immune cell types, was applied to evaluate immune cell composition based on the gene expression matrix (15). The Linear Models for Microarray data (limma) package in the R language (16) was utilized to identify differentially expressed genes (DEGs) with a p < 0.05 between patients with AF and SR. The GSE31821, GSE41177, and GSE79768 datasets were used for validation. The batch effect was removed using the SVA package in the R language.

WGCNA
The WGCNA (10) was applied to construct the mRNA coexpression network based on the DEGs. Briefly, an appropriate

Functional Enrichment Analyses
The Database for Annotation, Visualization and Integrated Discovery (DAVID, v6.8) (17) was used to perform the Gene Ontology (GO) (18,19) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, which revealed the biological processes (BPs), cellular components (CCs), molecular functions (MFs), and pathways related to genes in the module identified above. GO terms and KEGG maps with a p < 0.05 were considered significant enrichment.

Construction of PPI Network and Hub Genes Identification
The DEGs in the selected module were imported into the Search Tool for the Retrieval of Interacting Genes (STRING, v11.0) (20) to generate the PPI network identifying the interactions between the genes with the threshold of interaction score >0.4. Nodes represent proteins, and edges represent protein-protein associations in the PPI network. The results downloaded from the STRING database was then visualized utilizing Cytoscape software (v3.8.1). CytoHubba (21), a plug-in of the Cytoscape software, was used to identified hub genes. The intersection of five algorithms in CytoHubba was generated for hub genes identification to ensure the accuracy and robustness of the results.

Sample Collection
Adult patients with persistent AF undergoing cardiac surgery in Zhongshan Hospital were included in this study. Persistent AF is defined as AF which is continuously sustained beyond 7 days, including episodes terminated by cardioversion (drugs or electrical cardioversion) after more than seven days (3). Excluded from the research were patients with coronary artery heart disease, hypertension, diabetes, or obstructive sleep apnea syndrome, whose ejection fractions were <30%, and those who   had contraindications to surgery. The left atrium tissues and blood samples of twenty patients with persistent AF and ten healthy donors with SR were collected during cardiac surgery. The samples were then immediately preserved in liquid nitrogen for the later experiment. This study was in full compliance with the Declaration of Helsinki and approved by the Medical Ethics Committee of Zhongshan Hospital, Fudan University (Approval No. B2019-198R). All patients participating in this study have signed written informed consent before surgery.

Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)
Total left atrium tissue RNA was extracted with the RNeasy TM Mini Kit (QIAGEN, Frankfurt, Germany) following the manufacturer's instruction. The PrimeScript TM RT reagent Kit (Takara, Otsu, Japan) was used to conduct reverse transcription.
QRT-PCR was performed with the TB Green R Premix Ex Taq TM II (Takara, Otsu, Japan) on QuantStudio TM 5 System (Thermo Fisher Science, Waltham, MA, USA). The expression data was normalized by GAPDH, and the 2 − CT method was applied to analyze the results. All sequences for RNA primers (Sangon Biotech, Shanghai, China) are shown in Table 1.

Identification of DEGs
A total of 2,350 genes, including 1,115 upregulated and 1,235 downregulated, which were differentially expressed between AF samples and SR samples, were identified in the GSE115574 dataset with the limma package.

Relative Immune Cells Expression
The Cibersort with 22 types of immune cell subtypes was applied to estimate the putative relative expression of immune cells. The results are shown in Figure 1. Naïve B cells, CD4 naïve T cells, CD4 memory activated T cells, gamma delta T cells, resting NK cells, M0 macrophages, activated mast cells, eosinophils, and neutrophils were eliminated because most of the samples were not inferred to express in these immune cells. The correlation of remaining immune cells was calculated using the Pearson correlation coefficient, as shown in

Construction of the Weighted Coexpression Networks
Based on the DEGs identified above, a total of 2,350 genes were subjected to WGCNA. We then established a scale-free (scalefree R 2 > 0.85) coexpression network with the soft-thresholding power β = 8. Using the average linkage hierarchical clustering method, DEGs were clustered into eleven modules with different colors, including black, blue, brown, cyan, greenyellow, gray, lightcyan, magenta, midnightblue, tan, and turquoise (Figure 3).

Correlation Between Modules and Immune Cell Types
Correlation analysis was performed between each module and immune cell types selected above, including memory B cells, plasma cells, CD8 T cells, CD4 memory T cells, follicular helper T cells, regulatory T cells (Tregs), resting NK cells, activated NK cells, monocytes, M1 macrophages, M2 macrophages, resting dendritic cells, resting mast cells. The results demonstrated that the magenta module (r = 0.67, p < 0.001) and the blue module (r = 0.58, p = 0.008) were significantly positively correlated with M1 macrophages, while the greenyellow module (r = −0.62, p = 0.004) was significantly negatively correlated with M1 macrophages, as shown in Figure 4. The magenta module with 246 genes was identified as the key module associated with M1 macrophages with the highest correlation coefficient.

Functional Enrichment Analyses
Genes in the magenta module were selected to perform GO and KEGG functional enrichment analyses utilizing the DAVID online tool to investigate the biological effects, as shown in Figure 5. The significant enriched BPs included immune response, ureteric bud development, aorta development, extracellular matrix organization, cyclic nucleotide biosynthetic process, and calcium ion transmembrane transport. In addition, extracellular matrix, extracellular space, extracellular region,

Construction of the PPI Network and Hub Genes Identification
DEGs in the magenta module were imported into the STRING online tool to evaluate the interaction between these genes, and a total of 167 nodes and 393 edges were identified from the PPI network, as shown in Figure 6. Five algorithms of the CytoHubba, including MCC, DMNC, MNC, Degree, and EPC, were then applied to process the PPI network to identify the top ten genes, which are shown in Table 2. A Venn diagram (Figure 7) was generated to establish the intersection of genes identified by five algorithms, and CTSS, CSF2RB, and NCF2 were determined as hub genes. These three genes may play considerable roles in the pathophysiology of AF.

Validation of the Hub Genes
The expression levels of three hub genes were detected in LAs and blood samples, respectively, by qRT-PCR. The results showed that the expression levels of CTSS, CSF2RB, and NCF2 in AF were significantly higher than those in SR both in LAs and blood samples, which were consistent with the bioinformatic analysis (Figure 8). Moreover, we combined three datasets, GSE31821, GSE41177, and GSE79768, to verify the expression levels of these three genes between AF and SR. The results also demonstrated that the expression levels of CTSS, CSF2RB, and NCF2 were significantly higher in AF than those in SR (Figure 9).

DISCUSSION
AF is the most common tachyarrhythmia in the clinic. The typical clinical manifestations of AF are palpitations, fatigue, chest tightness, and decreased exercise tolerance, which seriously affect the life quality of patients (22). AF increases the risk of ischemic stroke by five times higher than healthy people and leads to high morbidity and mortality (23). Despite the fact that lots of efforts have been made, however, the molecular mechanism of AF development is still not completely understood. Therefore, it is significant urgent to clarify the pathogenesis of AF and find potential therapeutic targets.
In this study, we downloaded the GSE115574 dataset from the GEO database and estimated the composition of the immune cells using Cibersort based on the expression matrix. WGCNA was performed to determine the module with the most robust relationship between genes in the module and immune cell types. A total of eleven modules were identified, and the magenta module was significantly correlated with M1 macrophages. To the best of our knowledge, it is the first time that WGCNA has been used to analyze the relationships between immune cell types and AF. We then performed enrichment analysis on genes in the magenta module. According to GO analysis, genes were mainly enriched in immune response, ureteric bud development, aorta development, and extracellular matrix organization. KEGG pathway analysis demonstrated that genes were mainly enriched in staphylococcus aureus infection, MAPK signaling pathway,  and mTOR signaling pathway. Based on the PPI network and CytoHubba, we identified three hub genes, including CTSS, CSF2RB, and NCF2. As we know, the relationships between these three genes and the molecular mechanism of AF has not been studied, which is worth further research. Finally, qRT-PCR was performed to verify the expression of CTSS, CSF2RB, and NCF2. The expression levels of these three genes in patients with AF were significantly higher than those in patients with SR, which were consistent with the bioinformatic analysis. The expression levels of these three genes were also verified in three other datasets, GSE31821, GSE41177, and GSE79768.
Previous studies have shown that macrophages are associated with atrial fibrosis, leading to structural remodeling in the process of AF (2,7,24). Cytokines released by macrophages such as tumor necrosis factor-α (TNF-α) and interleukin-1β (IL-1β) can activate fibroblast proliferation, leading to fibrous tissue formation (25). However, few studies have investigated the precise role of these cytokines in the molecular mechanism of AF.
CTSS is a lysosomal cysteine proteinase, playing a crucial role in the degradation of antigenic proteins on major histocompatibility complex (MHC) class II molecules (26). CTSS is associated with many inflammatory and autoimmune diseases. CTSS expressed by intimal macrophages was involved in atherogenesis, and deficiency of CTSS could reduce atherosclerosis in LDL receptor-deficient mice (27). miR4498/CTSS might polarize macrophages into proinflammatory phenotype and accelerate chronic atherosclerotic inflammation (28). CTSS participated in the abdominal aortic aneurysm (AAA) formation, and inhibition of CTSS suppressed AAA formation in mice (29). A previous study showed that CTSS-mediated induction of CX3CL1 might contribute to the ocular surface and lacrimal glands inflammation in Sjögren's syndrome with a 4.5-fold increase in CX3CR1-expressing macrophages (30). In chronic obstructive pulmonary disease, reduction in CTSS expression prevents loss of lung function, reduces inflammation, and slows the lung tissue remodeling (31). Moreover, CTSS was identified as novel biomarkers for diseases and physiological processes, including triple-negative breast cancer, sarcoidosis, and particulate-induced lysosomal disruption in macrophages (32)(33)(34).
CSF2RB is the common beta chain of the high affinity receptor for interleukin-3, interleukin-5, and colony-stimulating factor. The role of CSF2RB has been studied in many diseases. Runt-related transcription factor 1 directly bound to the promoters of CSF2RB, which regulated apoptosis of neuroblastoma (35). The activating hotspot mutation in CSFR2B was identified in myeloid leukemia in Down syndrome (36). The mutation in CSF2RB can also cause hereditary pulmonary alveolar proteinosis (PAP) (37). In CSF2RB-/-mice, statin therapy reduces cholesterol accumulation in alveolar macrophages and ameliorates PAP (38). Moreover, CSF2RB was found overexpressed on monocytes from Alzheimer's disease patients, which contributed to granulocyte-macrophage colonystimulating factor-induced monocyte migration (39). However, the role of CSF2RB has never been studied in AF. Further research is required to determine whether CSF2RB can become a novel therapeutic target for AF.
NCF2, encoding neutrophil cytosolic factor 2, mainly results in autoimmune diseases. In a previous case-control study, four single-nucleotide polymorphisms within the NCF2 gene were genotyped, and the rs10911362 variants were associated with a decreased TB risk in the Western Chinese Han population (40). NCF2 deficiency resulted in granulomas, and the NCF2 mutation caused diverse and unusual clinical phenotype of chronic granulomatous disease (41,42). In multiple sclerosis, NCF2 was identified to be associated with eleven single nucleotide polymorphisms (43). Nevertheless, the relationship between NCF2 and AF has not been elucidated.
There are some limitations to our study. First, the data we used was from public databases, which were limited in the sample size.
Further prospective studies on more patients should be carried out to support our results. Second, although we have performed qRT-PCR to verify the expression levels of genes, mechanistic studies need to be conducted.

CONCLUSION
In this study, we performed WGCNA to analyze the relationships between immune cell types and AF for the first time. Three novel genes (CTSS, CSF2RB, and NCF2) which have never been studied in AF were identified. These three genes are worthy of further study and may become potential therapeutic targets in AF.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Medical Ethics Committee of Zhongshan Hospital, Fudan University. The patients/participants provided their written informed consent to participate in this study.