Weighted Gene Co-expression Network Analysis Identifies Crucial Genes Mediating Progression of Carotid Plaque

Background Surface rupture of carotid plaque can cause severe cerebrovascular disease, including transient ischemic attack and stroke. The aim of this study was to elucidate the molecular mechanism governing carotid plaque progression and to provide candidate treatment targets for carotid atherosclerosis. Methods The microarray dataset GSE28829 and the RNA-seq dataset GSE104140, which contain advanced plaque and early plaque samples, were utilized in our analysis. Differentially expressed genes (DEGs) were screened using the “limma” R package. Gene modules for both early and advanced plaques were identified based on co-expression networks constructed by weighted gene co-expression network analysis (WGCNA). Gene Ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) analyses were employed in each module. In addition, hub genes for each module were identified. Crucial genes were identified by molecular complex detection (MCODE) based on the DEG co-expression network and were validated by the GSE43292 dataset. Gene set enrichment analysis (GSEA) for crucial genes was performed. Sensitivity analysis was performed to evaluate the robustness of the networks that we constructed. Results A total of 436 DEGs were screened, of which 335 were up-regulated and 81 were down-regulated. The pathways related to inflammation and immune response were determined to be concentrated in the black module of the advanced plaques. The hub gene of the black module was ARHGAP18 (Rho GTPase activating protein 18). NCF2 (neutrophil cytosolic factor 2), IQGAP2 (IQ motif containing GTPase activating protein 2) and CD86 (CD86 molecule) had the highest connectivity among the crucial genes. All crucial genes were validated successfully, and sensitivity analysis demonstrated that our results were reliable. Conclusion To the best of our knowledge, this study is the first to combine DEGs and WGCNA to establish a DEG co-expression network in carotid plaques, and it proposes potential therapeutic targets for carotid atherosclerosis.


INTRODUCTION
Carotid atherosclerosis is characterized by lipid accumulation and inflammation, which underlie the thickening of the carotid intima where the plaque is formed (Libby et al., 2011). Compared with early plaques, advanced plaques are more vulnerable and prone to rupture. Surface rupture of the plaque leads to abrupt thrombus formation which, in turn, triggers cerebrovascular disease, including transient ischemic attack and stroke (Golledge et al., 2000).
In recent years, the rapid progress of microarray and RNA-seq technologies has facilitated gene expression profiling. It has been determined that hemoglobin metabolism and bone resorption are crucial pathways in plaque vulnerability, and dysregulated genes, including SYNPO2, LMOD1 and PPBP, have been identified in carotid plaques (Perisic et al., 2016). In addition, Alloza et al. (2017) reported from an RNA-seq-based transcriptomic study that smooth muscle cells (SMCs) from unstable plaques showed a senescence-like phenotype, while stable plaques were suggestive of an osteogenic phenotype. A large meta-analysis of GWAS implicated one novel locus (PIK3CG) involved in carotid plaque and eight novel susceptibility loci linked with carotid artery intima thickness (Franceschini et al., 2018). Despite this progress, the molecular mechanisms involved in the formation and progression of carotid plaques have not been fully elucidated.
Weighted gene co-expression network analysis (WGCNA) is a powerful tool to identify gene co-expression modules, explore the correlation of the modules and phenotypes and discover hub genes that regulate critical biological processes (Zhang and Horvath, 2005). WGCNA has been widely employed in the cardiovascular field to investigate such topics as abdominal aortic aneurysms, obstructive coronary artery disease and varicose veins (Liu et al., 2016;Chen et al., 2019;Mo et al., 2019;Wang Y. et al., 2019;Zhang et al., 2019).
To clarify the potential molecular mechanisms underlying carotid atherosclerosis, we analyzed 2 datasets of transcriptomes downloaded from the Gene Expression Omnibus (GEO) 1 and Sequence Read Archive (SRA) databases. Differentially expressed gene screening was conducted, and co-expression networks were constructed by WGCNA. Gene Ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) pathway analyses were conducted in each module, and hub genes in each functional module were identified. Next, the DEG co-expression network was created, and crucial genes were mined based on this network using Molecular Complex Detection (MCODE). These crucial genes were validated using another independent dataset, GSE43292, through the WGCNA pipeline. Gene set enrichment analysis for 3 crucial genes was performed, and the fraction of 22 immune cells was determined and compared between early and advanced plaques by CIBRTSORT (Newman et al., 2015). Least absolute shrinkage and selection operator (LASSO) regression and linear discriminant analysis (LDA) were also performed to build a classifier to discriminate between advanced plaque and early plaque. Finally, a sensitivity analysis was performed to evaluate the robustness of the network we constructed. 1 https://www.ncbi.nlm.nih.gov/geo/

Datasets
Three datasets, GSE28829, GSE104140, and GSE43292, were selected from the GEO database. GSE28829 and GSE104140 were used for WGCNA, while GSE43292 was used as the validation dataset. GSE28829 and GSE43292 were mRNA microarray datasets. GSE28829 involved 16 advanced plaque samples and 13 early plaque samples, while GSE43292 involved 32 atheroma plaques and paired macroscopically intact tissue. The series matrix file and data table of the microarray platforms GPL570 and GPL6244 were downloaded. GSE104140 was an RNA sequencing dataset that involved 19 advanced plaque samples and 13 early plaque samples. The fastq RNA sequencing data were downloaded from the SRA database (SRP118628). This study involved no human or animal subjects.

Data Preprocessing and DEG Screening
Data tables of GPL570 and GPL6244 were used to annotate the series matrix files of GSE28829 and GSE43292, respectively, with official gene symbols (i.e., replace the probe name with the official gene symbol), and the gene expression matrices were obtained. Spliced Transcripts Alignment to a Reference (STAR) software (Dobin et al., 2013) was used to conduct RNA quantification to obtain the expression matrices of GSE104140. Next, we merged the gene expression matrix of GSE28829 and the TPM matrix of GSE104140. The "sva" R package was employed to remove batch effects (Supplementary Figure S1). Finally, the "limma" R package was utilized to conduct differentially expressed gene (DEG) screening. | log 2 (fold-change)| >1 and adjusted p < 0.05 were set as thresholds of DEG screening. DEG analysis was also conducted for the validation set GSE43292, and we compared all the DEGs (adjusted p < 0.05) between the training set (GSE28829 and GSE104140) and this validation set.

Co-expression Network Construction by WGCNA
The "WGCNA" R package was employed to construct coexpression network for both advanced plaque samples and early plaque samples. All the genes were involved in further analysis. Pearson's correlation matrix was calculated. Next, using the formula a mn = | c mn | β (where a mn represents adjacency between gene m and gene n, c mn represents Pearson's correlation coefficient between gene m and gene n, and β represents softpower threshold), the weighted adjacency matrix was created. A topological overlap measure (TOM) was created based on an adjacency matrix for gene module detection. In the detection of gene modules, average linkage hierarchical clustering was employed to build a clustering dendrogram, and the minimal gene module size was 100. After gene module detection, similar modules were merged with the threshold of 0.25. The atheroma samples in the validation set (GSE43292) were also included in another WGCNA.  Volcano plot. The x-axis represents the -log 10 (adj.P.Val) while the y-axis represents log 2 (fold-change). The red dots represent up-regulated DEGs while the green dots represent the down-regulated DEGs.

Functional and Pathway Enrichment Analysis
The "clusterProfiler" R package was employed for gene ontology (GO) and Kyoto Encyclopedia of Genes Genomes (KEGG) pathway analysis for genes in each module. The threshold for the analysis was set as count >2 and adjusted p < 0.05. Next, we compared the count number of the top 10 significant terms of modules of interest to determine the difference in co-expression patterns between advanced plaque samples and early plaque samples using Wilcoxon's rank sum test.

Hub Gene Identification and Crucial Gene Mining
Genes with the highest intramodular connectivity and module membership (MM) >0.8 calculated by the "WGCNA" R package were identified as hub genes for each functional module. Next, the DEG co-expression networks were obtained by mapping DEGs into the whole co-expression network of advanced plaque samples and early plaque samples using Cytoscape v3.7.0. MCODE, a plugin of Cytoscape, was employed to detect the densely connected subnetwork of the DEG co-expression network of advanced plaque samples. The DEG co-expression network of advanced plaque samples and the most significant subnetwork was visualized, and genes with the top 10 degrees in this subnetwork were selected as crucial genes. The MM of the hub genes in the functionally important module was compared between advanced plaque samples and early plaque samples. A DEG co-expression network of the validation set (GSE43292) was also constructed to validate crucial genes. Furthermore, receiver operating characteristic (ROC) analysis was conducted by SPSS 25.0 to demonstrate the potential diagnostic value of these crucial genes to discriminate advanced plaque and early plaque samples.

LASSO Regression Model and LDA
The lasso algorithm was performed using the R package "glmnet" to prioritize candidate genes in the MCODE subnetwork for building a classifier. The optimal lambda for the coefficient was computed with a minimum value of 10-fold cross-validation in the training set. The dataset was randomly split into training and validation groups at a proportion of 3:1. The LDA model was established and verified using the R package "MASS." Gene Set Enrichment Analysis GSEA for the 3 crucial genes with the highest degree was conducted using GSEA 4.0.3 software. The c5.bp.v7.0.symbols.gmt and c2.cp.kegg.v7.0.symbols.gmt reference gene sets were downloaded from the official GSEA website 2 . To perform GSEA, the plaque samples were divided into two groups (i.e., samples with high expression levels of crucial genes vs. samples with low expression levels of crucial genes) according to the median value of each gene. Compared with conventional GSEA analysis, the grouping of samples is based on the expression level of each gene in single-gene analysis.

Sensitivity Analysis of Networks We Constructed
To ensure the robustness of the networks we constructed, we performed a sensitivity analysis. One sample was randomly deleted based on the random number generated by the runif() function in R software. After this sample was deleted, the whole WGCNA pipeline analysis was performed, and crucial genes were identified.

Statistical Analysis
Data preprocessing, DEG screening, WGCNA and functional and pathway enrichment analysis were performed in R v3.6.2. Crucial genes were mined using MCODE in Cytoscape v3.7.0. GSEA was conducted by GSEA v4.0.3. We described the details of these bioinformatic analyses in the corresponding subsections. The count numbers of the top 10 significant functional terms of modules of interest between advanced plaque and early plaque samples were compared by Wilcoxon's rank sum test, and the potential diagnostic value of key genes was demonstrated by ROC analysis using IBM SPSS 25.0. The grouping variable of advanced plaque and early plaque (State Variable in SPSS) and the expression value of crucial genes (Test Variable in SPSS) were entered as input, and non-parametric assumptions were chosen as default in ROC analysis. A p < 0.05 was considered to be significant.

Code Availability
The code used in this study (from the GEO datasets to the WGCNA analysis to the crucial gene mining) is available at: https://github.com/cmy2013/classify-early-and-advancedcarotid-plaque.

Workflow
The workflow of the present study is shown in Figure 1. DEGs were screened, and the co-expression networks were constructed. A DEG co-expression network was constructed by mapping DEGs into the co-expression network of advanced plaque samples. Crucial genes were obtained based on this DEG co-expression network. These crucial genes were validated based on another independent GEO dataset through the WGCNA pipeline. Next, the genes in the MCODE cluster were used to build a classifier to discriminate advanced plaque and early plaque by lasso regression and LDA analysis. After that step, single-gene GSEA was performed for crucial genes, and their potential clinical significance was determined by ROC analysis. In addition, gene modules were detected for both early plaque and advanced plaque samples. Functional enrichment analysis was conducted using genes in each module. The co-expression patterns were compared between early plaque samples and advanced plaque samples based on the results of functional enrichment analysis. Finally, a sensitivity analysis was performed to evaluate the robustness of the network that we constructed.

Screening of DEGs
The expression patterns of DEGs are shown in Figure 2. With the threshold of adjusted p < 0.05 and | log 2 (fold-change)| >1, we screened 436 DEGs, among which 335 were up-regulated and 81 were down-regulated. DEGs with top-10 | log 2 (fold-change)| are shown in Table 1. All DEGs for the training set and validation set are listed in Supplementary Tables S1, S2, respectively. A total of 4,057 overlapping DEGs were identified between the training set and validation set only with the threshold of adjusted p < 0.05 (Supplementary Figure S2).

Constructed of Weighted Gene Co-expression Network
No outliers were excluded by sample clustering (Figure 3A and Supplementary Figure S3A). The cut-off of R 2 3 was set to be 0.85 and soft-threshold power β = 14 and β = 18 were selected for advanced plaque and early plaque samples, respectively (Figures 3B,C and Supplementary Figures S3B,C). The adjacency matrices that store the information of whole co-expression networks were constructed. The histogram and the linear plot showed that both networks that we constructed met the requirements of scale-free topology (Figures 3D,E and Supplementary Figures S3D,E). Next, TOM matrices were created, and gene modules were detected based on TOM matrices. After merging similar modules, seven gene modules were detected for both advanced and early plaque samples ( Figure 3F and Supplementary Figure S3F).

Functional Enrichment of Gene Modules
The biological functions of the genes in each module were shown by GO-BP and KEGG pathway analysis (Supplementary Tables  S3, S4). The co-expression patterns between advanced plaque samples and early plaque samples were compared based on these results. We observed that the black module of advanced plaque samples was associated with inflammation and immune response. GO-BP and KEGG pathways with top-10 significant adjusted p value of this module was shown in Figure 4 and Table 2. The Wilcoxon's rank sum test (p = 0.0003) showed that the count number of black module of advanced plaque samples of these GO-BP and KEGG pathways was significantly larger that of magenta module of early plaque samples, which is the corresponding immune and inflammation associated module in early plaque samples (Supplementary Figure S4). This suggested that genes associated with inflammation and immune response were scattered in the modules of early plaque samples. These results suggested that inflammation and the immune response might play roles in the progression of carotid plaques.

Identification of Hub Genes and Mining of Crucial Genes
The intramodular connectivity for genes in each module was calculated by the WGCNA algorithm. Genes with the highest  Next, a DEG co-expression network was generated by mapping DEGs into the whole co-expression network of advanced plaques. After removing isolated nodes and node pairs, the network had 393 edges and 344 nodes (Supplementary Figure S5). The MCODE subnetwork and 10 crucial genes were visualized (Figure 5). NCF2 (neutrophil cytosolic factor 2), IQGAP2 (IQ motif containing GTPase activating protein 2) and CD86 (CD86 molecule) were the genes with the highest degree in the crucial gene cluster. ROC analysis was performed, and all these crucial genes showed potential diagnostic significance. The areas under the curve (AUCs) of most crucial genes were above 0.90 ( Table 3). The ROC curves of NCF2, IQGAP2, and CD86 are visualized in Figure 6. All these crucial genes were found in the DEG co-expression network of GSE43292 and were successfully validated (Figure 7 and Supplementary Table S6).

LASSO Regression Analysis and LDA
To identify prognostic markers in the progression of atherosclerosis, the top 100 genes in the MCODE subnetwork were input into the LASSO regression model. The lambda value was 0.0125, and 15 genes were selected to calculate the risk score ( Figures 8A,B and Supplementary Table S7. The risk model was constructed with the coefficients of 15 genes: IQGAP2, FPR3, FCER1G, SLC1A3, C5AR1, PLA2G7, ALOX5, CCR1, RASGRP3, SLAMF8, C3AR1, AIF1, AMPD3, BTK, and CTSB. There was a clear shift in LDA function, with a left shift being observed for early plaques and a right shift for advanced plaques. The accuracy of the LDA model using the confusion matrix algorithm was 100% ( Figure 8C). These results indicated that the classifier can clearly discriminate advanced plaque and early plaque.

Gene Set Enrichment Analysis
Single-gene GSEA for each crucial gene was performed. Gene sets associated with the immune response and inflammation were highly up-regulated in the group with high expression levels of crucial genes (e.g., the NCF2-high group), which also suggested that the immune response and inflammation play important roles in the progression of carotid atherosclerosis (Figure 9).

Sensitivity Analysis of Constructed Network
We ranked the sample ID in ascending order. Next, the runif() function generated 18 and 13 as random numbers for advanced plaque samples and early plaque samples. The corresponding samples (SRR6059638 for advanced plaque and GSM714098 for early plaque samples) were deleted. The results of sample clustering (Supplementary Figures S6A, S7A) were nearly the same as those of the original network construction ( Figure 3A and Supplementary Figure S3A), except the deleted samples were not in the results. Then, with the same cutoff, the same soft-threshold βs (β = 14 for advanced plaque samples and β = 18 for early plaque samples) were selected to construct coexpression network (Supplementary Figures S3B-E, S6B-E). In the module detection, the same numbers of modules (7 gene modules) were detected for both groups (Supplementary Figures  S6F, S7F). Finally, most of the crucial genes overlapped with the original analysis, except that ARRB2 was replaced by ACP5 (Supplementary Figure S8), which was also in the MCODE cluster. The results of the sensitivity analysis showed that our results were reliable.

DISCUSSION
In the present study, 436 DEGs were identified in plaque samples. Based on weighted gene co-expression networks and modules were constructed using the WGCNA algorithm. Functional enrichment analyses of GO and KEGG were performed for each module. In the GO-BP and KEGG pathway analyses, pathways associated with inflammation and immune response were clustered in the black module of advanced plaque samples. These pathways were dispersed in early plaque samples. NCF2, IQPAG2, and CD86 had the highest connectivity among the crucial genes identified by MCODE. Based on single gene GSEA, gene sets correlated with inflammation and immune response were highly up-regulated in the group of high expression level of crucial genes. Finally, the sensitivity analysis showed that our results were reliable. Several previous studies have used the microarray dataset GSE28829 to screen DEGs Wang et al., 2014;Tan et al., 2017;Liu et al., 2018). Different methods and different criteria lead to differences in DEG results. Wang et al. screened 267 up-regulated genes and 52 down-regulated genes. The hub gene was TYRO protein tyrosine kinase binding protein (TYROBP). CD14, which is one of the crucial genes in the present study, was one of up-regulated the up-regulated genes revealed by COXPRESdb (Wang et al., 2014). Liu et al. obtained 758 differentially expressed genes with thresholds of FDR <0.05 and | log 2 FC| >0.58. ITGAM and ACTN2 have the highest degree in the PPI network (Liu et al., 2018). Lin et al. detected DEGs by robust multiarray average (RMA). In the study by these researchers, 322 up-regulated genes and 385 down-regulated genes were identified in advanced plaque samples. CFL2 and MMP9 were identified as important genes in the transcriptional regulatory network . Tan et al. used the signalto-noise method and listed the top 100 up-regulated genes and top 100 down-regulated genes. IGK, MMP9, and IGLC2 are hub genes for up-regulated genes. AW451999, CFL2, and PDZRN3 are hub genes for down-regulated genes (Tan et al., 2017). Other datasets were also employed to identify the DEGs. Liu et al. (2020) analyzed GSE41571, GSE118481, and E-MTAB-2055 and discovered that JCHAIN, CXCL10, and HMOX1 were the crucial  genes for up-regulated genes, and the most down-regulated genes were COL21A1, DACT3, and ACTC1 in unstable plaques. However, these studies did not use a weighted and scale-free network approach, which identifies genes with critical positions in the whole network, to elucidate the molecular mechanism governing plaque progression. Some studies have identified key genes in atherosclerosis progression using WGCNA. By applying lncRNA classification to the GSE28829 dataset and confirming correlated genes by conducting WGCNA to analyze GSE40231, Wang et al. identified 6 lncRNAs, including ZFAS1 and LOC100506730, that play crucial roles in atherosclerosis. These lncRNAs are involved in the interferon-gamma-mediated signaling pathway and leukotriene biosynthetic process (Wang C.H. et al., 2019). Albright et al. (2014) discovered that pathways that regulate inflammatory and macrophage gene function were correlated with atherosclerosis, and CD44 was a critical gene.
The present study compared the co-expression patterns between early and advanced plaques using the WGCNA method and generated a DEG co-expression network by mapping DEGs into the whole co-expression network of advanced plaque samples for the first time.
The functional enrichment of WGCNA modules indicated that inflammation and immune response are related to the progression of atherosclerosis. Atherosclerosis is overwhelmingly proven to be a chronic inflammatory disease (Hansson et al., 2006;Wolf and Ley, 2019). Bioinformatic, experimental, clinical and epidemiological studies demonstrated a relationship between inflammation and lipid metabolism. After low-density lipoproteins deposit in the intima, monocytes are recruited by leukocyte adhesion molecules and differentiate into macrophages. By upregulating scavenger receptors and Toll-like receptors, macrophages mediate lipoprotein internalization and transmit activating signals. Severe inflammation may lead to plaque rupture and thrombosis. Inflammatory markers, including CRP, IL-6, and Pentraxin-3, are used to monitor the atherosclerosis process in coronary artery disease (Hansson et al., 2006). The immune response is initiated by specific antigens, including the ApoB100 protein of LDL. Regulatory T cells secrete atheroprotective IL-10 and TGF-β to attenuate atherosclerosis, while Th1 cells promote atherosclerosis. As atherosclerosis progresses, T cells express proatherogenic Treg transcription factors, including RORγ-T (Th17), Bcl-6 (TFH), or T-bet (Th1), instead of the atheroprotective Th transcription factor FoxP3 (Gistera and Hansson, 2017). This finding is consistent with the fraction of regulatory T cells in plaque samples that we calculated by the CIBERSORT algorithm.
The hub gene of the black module was ARHGAP18. ARHGAP18 (SENEX) is a negative regulator of YAP and RhoC (Coleman et al., 2020). Overexpression of RhoC destroys the actin cytoskeleton and cell junctions. ARHGAP18 regulates the alignment of endothelial cells and transmigration of neutrophils by stabilizing microtubes (Lovelace et al., 2017). Lay et al. (2019) reported that the expression of ARHGAP18 increased in areas where plaques formed and progressed. This finding is in keeping with our conclusion. Areas with turbulence and low shear force, such as carotid artery bifurcation, are highrisk areas for arterial plaque formation. In in vitro experiments, the expression of ARHGAP18 increased in areas with low wall shear stress and turbulence, which are risk factors for atherosclerosis.
Among these crucial genes detected by MCODE based on the DEG co-expression network, NCF2, IQGAP2, and CD86 had the highest connectivity. ROC curve analysis of these three genes was performed. The AUC was 0.918 for NCF2, 0.853 for IQGAP2, and 0.923 for CD86.
NCF2 encoded a 526-amino acid protein P67 phox (de Albuquerque et al., 2019). Recent reports demonstrated that NCF2 was significantly associated with hypertension by increasing the expression and activity of NOXs and reactive oxygen species (ROS) generation (Risley et al., 2003;Li et al., 2018). Bioinformatic analyses indicated that up-regulated expression of NCF2 increased susceptibility to unstable atherosclerotic plaque-related stroke (Zhou et al., 2020). Few studies, however, have evaluated the potential mechanism of the correlation between the NCF2 variant and the progression of atherosclerosis. P67 phox is a subunit of NADPH oxidase 2 (NOX2). NOX2 is up-regulated in unstable atherosclerotic plaques (Singel and Segal, 2016). NOX2 activity contributes to the formation and accumulation of oxidized low-density lipoproteins (ox-LDL) via ROS products (Violi et al., 2017). The association between NOX2 and atherosclerosis was substantiated by mouse knockouts lacking subunits of Nox2. Inhibition of NOX2 by pharmacological intervention also delayed the atherosclerotic process. These results indicated that NCF2 might mediate the progression of atherosclerosis through NOX2.
IQGAP2 expresses IQ motif-containing GTPase-activating protein 2, which is a 180-kDa multidomain scaffolding protein (Ghaleb et al., 2015). IQGAP2 was first classified as a tumor suppressor expressed in the liver and was reported to regulate the PI3K/Akt and Wnt/β-catenin signaling pathways (Gnatenko et al., 2013). The PI3K/Akt pathway was demonstrated to be associated with polarization and apoptosis of macrophages by controlling mTOR assembly (Linton et al., 2016). Seimon and Tabas (2009) reported that stimulating macrophage apoptosis decreased atherosclerotic lesions in early plaques and that suppressing apoptosis increased atherosclerosis. In addition, the role of IQGAP2 in diminishing the production and recruitment of macrophages was evidenced in animal experiments. IQGAP2 has been shown to bind Rho GTPase rac1/cdc42 by the GTPasebinding domain in platelets and thus regulate NF-κB signaling (Schmidt et al., 2003). The signaling of Toll-like receptor 4 (TLR4)/NF-κB was impaired in Iqgap2 −/− mice. Interleukin-6 (IL-6), a pro-inflammatory cytokine driven by NF-κB, was subsequently suppressed. Various studies have demonstrated that IL-6 is essential in the downstream inflammatory response that mediates the initiation and progression of atherosclerosis (Wang et al., 2017).
CD86 expression and the M1/M2 marker ratio (CD86/CD163) were significantly higher in vulnerable plaques than in stable plaques (Williams et al., 2017). Deposited oxidized lipoproteins activate scavenger receptors, such as CD36 and TLR4, to polarize macrophages to the M1 subtype and eventually form foam cells (Momtazi-Borojeni et al., 2019). M1 phenotype macrophages release ROS and proinflammatory cytokines, such as TNF-α, IL-1β, IL-6, and IL-12, that damage endothelial cells and vessels (Shapouri-Moghaddam et al., 2018). In addition to M1 macrophages, CD86 was also found to be expressed on mature dendritic cells and partial T cells (Ewing et al., 2013). The expression of CD86 on dendritic cells was shown to be up-regulated in patients with coronary artery disease (Dopheide et al., 2007). In CD80 −/− CD86 −/− mice, the progression of atherosclerosis was delayed. Vignali et al. (2008) reported that CD80/86 dendritic cells play roles in the co-stimulation of Th1 cells. The proatherogenic role of Th1 cells has been clearly evidenced in a large body of animal experimental studies (Saigusa et al., 2020). T-cell and CD28-CD80/86 co-stimulation plays a vital role in both plaque formation and atherosclerosis development (Zirlik and Lutgens, 2015).
In addition to the top 3 crucial genes, other genes, such as the LIPA gene, also play important roles in atherosclerosis by encoding lysosomal acid lipase (LAL). Depletion of LIPA causes lipid metabolism disorders in mice (Li and Zhang, 2019). Bioinformatic studies identified that LIPA is correlated with coronary heart disease (Mehta, 2011;Nelson et al., 2017). Lysosomal acid lipase increased the release and degradation of triglycerides and cholesteryl esters in macrophages and hepatocytes (Du and Grabowski, 2004). The fatty acid liberated by hydrolysis elevated the expression of peroxisome proliferator-activated receptor-γ, which upregulated the production of CD36 and consequently enhanced the uptake of oxLDL (Chistiakov et al., 2016). Further studies are necessary to elucidate how lysosomal acid lipase contributes to atherosclerosis.
Single-gene GSEA indicated that crucial genes were significantly associated with immune and inflammatory pathways. The most frequent pathways related to crucial genes were B cell receptor signaling, the cell cycle, T cell receptor signaling and the Toll-like receptor signaling pathway. NCF2 plays key roles in the TLR4/NOX2 signaling pathway by expressing the subunit of NOX2 . IQGAP2 regulates TLR4/NF-κB signaling by binding Rho GTPase rac1/cdc42. The activation of the Toll-like receptor upregulates the expression of CD86 in macrophages and T cells. The results of this study confirmed the biological significance of this method for screening crucial genes.
In this study, we constructed a DEG co-expression network between early carotid plaques and advanced plaques for the first time. The hub genes and crucial genes were identified. NCF2, IQGAP2, and CD86 might play crucial roles in the process of carotid atherosclerosis. However, our study has several limitations. These genes have potential value for the diagnosis and treatment of carotid plaques. The GSE28829 and GSE104140 datasets lack clinical information. We cannot correlate clinical traits with gene modules. The results were based on data downloaded from GEO datasets, and further studies are needed to explore the detailed molecular mechanism of carotid atherosclerosis progression in vitro and in vivo.

AUTHOR CONTRIBUTIONS
MC designed experiments, analyzed data, and wrote the manuscript. SC designed experiments, prepared figures and tables, and wrote the manuscript. DY devised the concept, analyzed the data, and revised the manuscript. JZ, BL, YC, and WY designed experiments and edited the manuscript. HZ and LJ analyzed data and wrote the manuscript. YZ devised the concept, designed the research, supervised the study, and edited the manuscript. All authors contributed to the article and approved the submitted version. Supplementary Figure 6 | Construction of the co-expression network for advanced plaque in sensitivity analysis. SRR6059638 was delete based on generated random number. (A) No outliers were detected in the sample clustering and all samples were included in further study. (B,C) The cut-off for soft-threshold β was set to be 0.85 and β = 14 was selected. (D,E) The co-expression network we constructed met the requirements of scale-free topology. (F) In this sensitivity analysis of advanced plaque sample, 7 gene modules were detected, which is the same as that of the original network construction of advanced plaque. Figure 7 | Construction of the co-expression network for early plaque in sensitivity analysis. GSM714098 was delete based on generated random number. (A) No outliers were detected in the sample clustering and all samples were included in further study. (B,C) The cut-off for soft-threshold β was set to be 0.85 and β = 18 was selected. (D,E) The co-expression network we constructed met the requirements of scale-free topology. (F) In this sensitivity analysis of early plaque sample, 7 gene modules were detected, which is the same as that of the original network construction of early plaque. Figure 8 | Crucial genes identified in sensitivity analysis. Most of the crucial genes overlapped with the original analysis, except that ARRB2 was replaced by ACP5