Integrated DNA Methylation and Gene Expression Analysis Identified S100A8 and S100A9 in the Pathogenesis of Obesity

Background: To explore the association of DNA methylation and gene expression in the pathology of obesity. Methods: (1) Genomic DNA methylation and mRNA expression profile of visceral adipose tissue (VAT) were performed in a comprehensive database of gene expression in obese and normal subjects. (2) Functional enrichment analysis and construction of differential methylation gene regulatory networks were performed. (3) Validation of the two different methylation sites and corresponding gene expression was done in a separate microarray dataset. (4) Correlation analysis was performed on DNA methylation and mRNA expression data. Results: A total of 77 differentially expressed mRNAs matched with differentially methylated genes. Analysis revealed two different methylation sites corresponding to two unique genes—s100a8-cg09174555 and s100a9-cg03165378. Through the verification test of two interesting different expression positions [differentially methylated positions (DMPs)] and their corresponding gene expression, we found that methylation in these genes was negatively correlated to gene expression in the obesity group. Higher S100A8 and S100A9 expressions in obese subjects were validated in a separate microarray dataset. Conclusion: This study confirmed the relationship between DNA methylation and gene expression and emphasized the important role of S100A8 and S100A9 in the pathogenesis of obesity.


INTRODUCTION
With continuous improvement of living conditions, many countries have to pay more attention to the prevalence of obesity because obesity has reached the proportion of epidemic that is still rising (1). According to the World Health Organization's report in 2015, about one third or more adults are overweight, 43% of whom are male and 45% are female (2). Obesity is implicated in many diseases, such as metabolic syndrome (MetS), type 2 diabetes mellitus (T2DM), hypertension, arteriosclerosis, cardiovascular disease (CVD), and so on (3), and thus incurs heavy economic burdens on countries around the world (4). Obesity results from interactions between genetic and environmental factors, but for individuals, epigenetic factors can increase susceptibility to obesity (5).
In recent years, with the deepening of research, epigenetics has emerged as a bridge between genes and environmental factors. It can change gene expression and induce longterm changes in phenotype and disease susceptibility (6). Extensive epigenome-wide association studies (EWAS) on DNA methylation conducted in many populations have revealed the relationship between DNA methylation and obesity (7). Changes in gene methylation changes can alter the transcription of genes resulting in abnormal gene expression and eventually obesity (8).
Gene Expression Omnibus (GEO) database hosted the sequencing data of thousands of researchers, and its most important feature was that it provided open access to the data we needed to conduct our research. In the present study, we explored innovative methylation sites of obesity-related DNA by conducting an integration study using two microarray datasets and established the relationship between the expression of obesity-related genes and methylation Differential Expression of Methylated Genes (DEMGs). We then validated the DEMGs in a separate microarray dataset to explore the potential relationship between DNA methylation and mRNA expression on the regulation of obesity.

Gene Expression Profile and Probe Labeling
Three microarray datasets (GSE88837, GSE88940, and GSE109597) were downloaded from the gene expression database (https://www.ncbi.nlm.nih.gov/geo/) for analysis. GSE88837 was extracted from the U133 + 2.0 sequence of gpl570 Affymetrix human genome for gene expression. The purpose of this study was to use global gene expression to identify obesity-induced changes in gene expression profiles of lean and obese adolescent females. In our study, subjects with body mass index (BMI) ≥30 were defined as obese (1). A total of 30 subjects (including 15 obese and 15 healthy controls) were analyzed. We used the Affy package in R (9) to convert the cel files into an expression value matrix and the Robust Multichip Average (RMA) method to normalize the matrix. The Bioconductor package in R software was used to convert probe data into genes (10). If a gene corresponded to several probes, we chose the average expression value for further analysis. GSE88940 extracted from gpl13534, a human methylation 450 gene chip, was used for DNA methylation analysis, which consisted of 10 objects and 10 lean controls. Genomic DNA was extracted from visceral adipose tissue (VAT) of lean and obese adolescent females. Illumina Infinium Human Methylation 450 k BeadChips were utilized for global methylation profiles in VAT. All data processing was done in GEO2R (https://www.ncbi.nlm.nih.gov/ geo/geo2r/). In these two datasets (GSE88837 and GSE88940), there are 20 samples of the same person. We only analyze these 20 samples. GSE109597 was used as the validation dataset and aimed to predict obesity risk with genetic data, specifically, obesity-associated gene expression profiles. Genetic risk score was computed. The genetic risk score was significantly correlated with BMI when an optimization algorithm was used. Linear regression and built support vector machine models predicted obesity risk using gene expression profiles and the genetic risk score with a new mathematical method. The analysis method was the same as used for GSE 88837.

Differential Expression and Analysis of Methylated Genes (DEMGs)
We compared obese with control subjects to explore the differentially expressed genes (DEGs) of the marginal envelope in R (11). The threshold value was set as | log2 fold change |≥ 2, P < 0.05. GEO2R was used to determine the methylation sites [differentially methylated positions (DMPs)] by comparing the differences between normal and obese subjects. DMPs located in gene regions were assigned to corresponding genes, which were defined as differentially methylated genes (DMGs). The threshold value was set as |log 2 fold-change| ( β) > 0.05, P < 0.05. Then, we matched the DEGs with DMGs, and only the matched genes (DEMGs) were selected for further analysis.

Functional Enrichment Analysis
All functional enrichment analysis on DEGs was performed on clusterProfiler and Dose package in R (12). The complete functional enrichment analysis includes Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) approach, and Disease Ontology (DO). The threshold value of analysis was set as adjust-P < 0.05 and error detection rate [false discovery rate (FDR)] <0.05.

Protein-Protein Interaction Network and Module Analysis
We used the string database (version 11.0) (13) to explore protein prediction and experimental interactions. There are many methods of database prediction, including co-expression experiment, text mining, co-occurrence, gene fusion, database, and neighborhood. Also, we used the combination fraction to reveal the protein pair interactions in the database. Then, we localized DEMGs to PPIs to identify the key genes in the network with the cutoff value set to a comprehensive score >0.9 (14). As a valuable method, a degree is used to study the role of protein nodes in the network. Using the molecular complex detection (MCODE) on Cytoscape (version 3.71), the most significant clustering module and the main clustering module were explored (15,16). For further analysis, we set ease ≤0.05 and set ≥2 as the cutoff value and MCODE score >8 as the threshold value.

Validation of DEMGs
We used prism 8.0 GraphPad (17) for scatterplots of methylation and gene expression to detect the relationship between methylation and gene expression. Then, we calculated the correlation equation to judge whether the equation has statistical significance. The DEMGs were validated in GSE109597, which contained 84 unrelated samples. After grouping according to BMI (>30 and <30), the expression of DMEGs in the two groups was compared with ggplot2 in R.

Data Preprocessing
A quality control processing of GSE88837 showed that when all the means in the strip chart lie on the same horizontal line, all samples were normalized (Figure 1). We obtained 54,560 expression probes from each gene map and expression. Probes with too low or too high probe expression were defined as outliers and eliminated from further analysis. An average expression value was used to screen the DEGs to prevent too many probes corresponding to one gene. The limma software package was used to calculate the DEGs, which yielded 1,814 DEGs with P values <0.05. We use the obese population as a reference, among these, 150 with a log2 (fold change) > 2 were defined as upregulated and 199 with a log2 (fold change) < 2 as downregulated. The heat map and volcano map of DEGs are shown in Figures 2A,B.
Out of the 485,579 DNA methylation sites in GSE88940 VAT that were screened for quality control, 454,325 methylation sites were selected for analysis. Here, 10,016 DMPs (| β| > 0.05, P < 0.05) were identified, of which 666 were hypermethylated and 3,349 were hypomethylated. After annotation, 10,016 DMPs are located in 4,024 unique genes, which were identified as DMGs. Matching DMGs with DEGs yielded 77 genes for the next analysis (Figure 3). The details of 77 genes are shown in Table 1 and

Functional Annotation
The cluster profiler package in R for DO function, KEGG pathway enrichment, and GO analysis was used to clarify the role of the DEGs. Here, 104 cell components, seven  Enrichments including GO: 0061448 connective tissue development, GO: 0007584 nutritional response, GO: 0022600 digestive system process, GO: 0045444 adipocyte differentiation, hsa04010 MAPK signal pathway, hsa04657 IL-17 signal pathway, hsa0061 fatty acid biosynthesis, hsa04933 age-old signal pathway in diabetic complications, DOID: 9352 Type 2 diabetes mellitus; DOID: 374 nutritional diseases; DOID: 654 over nutrition; DOID: 9970 obesity have been previously reported to be related to obesity, and thus genes in these clusters were selected for further analysis.

Construction of Protein-Protein Interaction Network and Identification of Key Genes
The string database was used to clarify gene interaction networks of the selected genes on Cytoscape, which revealed 5,598 pairs of proteins and 1,003 nodes when the cutoff value was set to a comprehensive score >0.9 ( Figure 6A). MCODE analysis showed that the scores of the four modules were more than 10 (Figures 6B-E). Analysis of 98 genes in these four modules revealed that two of them were highly correlated and carried out submodule analysis to screen out GO, DO, and KEGG data. The two genes were S100 calcium-binding protein A8 (S100A8) and S100 calcium-binding protein A9 (S100A9).

Hub Genes Validation
First, we tested the relationship between S100A8 and S100A9 methylation and gene expression in the same 20 individuals in GSE88837 and GSE88940. We found that there was a negative correlation between methylation level and gene expression. The results also showed that the two correlation equation lines were statistically significant (P < 0.05; Figure 7). Validation in GSE109597 showed that the expression of S100A8 and S100A9 was significantly higher in obese vs. normal subjects (P < 0.05; Figure 8).

DISCUSSION
Emerging evidence has shown that obesity is not only a simple nutritional excess but also a metabolic disorder, which is the precursor of many metabolic diseases (18). More and more studies have confirmed that chronic inflammation is the result of the accumulation of fat cells caused by abnormal gene function (19). In the present study, the relationship between gene expression and obesity in 20 obese patients was analyzed using a gene-chip dataset. We found that the high expression of S100A8 and S100A9 was related to obesity. Also, a significant increase in S100A8 and S100A9 expression was observed in the validation microarray dataset of 84 samples. Whole-genome methylation analysis showed hypomethylation of the S100A8 and S100A9 promoters. Changes in methylation often lead to abnormal gene function and diseases and, in this case, may be an important cause of obesity.
Calprotectin is a heterodimer composed of S100A8 (calprotectin a, MRP8) and S100A9 (calprotectin B, MRP14) subunits, which are low-molecular weight members of S100 calprotectin subfamily (20). Previous studies have found that S100A8 and S100A9 are related to obesity, insulin resistance, and atherosclerosis (21). Sekimoto et al. (22) found that serum S100A8/A9 complex level was related to leukocyte count, BMI, subcutaneous fat area, and visceral fat area. At the same time, compared with lean mice, obese mice had higher S100A8 mRNA expression in the mature adipocyte component and obese mice had higher S100A9 expression in the matrix vascular component (22). Lylloff et al. (23) found that Roux-en-Y gastric bypass surgery (RYGB) significantly decreased BMI and circulating mRNA levels of S100A8 and S100A9 in obese patients. These findings are consistent with the present findings.
Epigenetics has become an intense topic of research in recent years. It refers to the regulation of gene expression without changing the basic structure of the gene. At present, epigenetics usually refers to histone modification and methylation of noncoding RNA and DNA (24,25). Many studies have found that epigenetic mechanisms are associated with obesity. Sonne et al. (26) found changes in the methylation and expression of nine genes in epididymal adipocytes, including ehd2 and kctd15, which are known to be obesity-related genes, and a new candidate gene IRF8, which may be related to the 1/2 balance of immune type. Similarly, Bell (27) and van Dijk et al. (28) also explained the relationship between DNA methylation and obesity. It is worth noting that these two studies are comprehensive in content, more stringent in inclusion criteria, extensive in research content, and very reliable in conclusions (27,28). Benton et al. (29) demonstrated the relationship between DNA methylation profiles of adipose tissue and weight loss before and after gastric bypass. This study provided a strong basis for future work and additional evidence for the role of DNA methylation     in adipose tissue in obesity. The findings that the promoters of both the S100A8 and S100A9 genes were hypomethylated, in addition to their significantly increased expression in obese people, confirmed the validity of our findings. Other studies showed that methylation is closely associated with changes in adiposity. Brask et al. (26) found that obesity was associated with specific changes in adipocyte DNA methylation and gene expression. In general, obesity was associated with the global DNA hypomethylation significantly and epidermal fat exhibited more obesity-associated Differentially methylated regions (DMRs) than inguinal fat. Among genes that exhibited simultaneous changes in methylation and gene expression,  some were involved in adipose tissue function in obese mice through hypomethylation-driven changes in expression.
On the other hand, Riuzzi et al. (30) found that levels of S100A8/S100A9 were closely correlated with adipocyte size, and the mechanism may be related to the involvement in regulating the cell division cycle. Our study revealed that elevated levels of S100A8/S100A9 were closely associated with the development of obesity, while hypomethylation of the promoter region resulted in upregulation of S100A8/S100A9 expression. There are a few limitations to this study. First of all, our data results only come from microarrays. Although it is clear that these two hub genes play roles in the pathogenesis of obesity, in vitro and in vivo studies are required to validate the relationship between these genes and obesity.

CONCLUSION
In summary, two GEO microarrays were explored for differential methylation and gene expression as epigenetic factors of obesity. After the functional analysis, we selected two DEMGs from another microarray dataset (including 84 samples) for verification. Our results showed hypomethylation and upregulation of S100A8 and S100A9 expression in obese patients. Also, correlation analysis showed that DNA methylation can regulate gene expression and lead to obesity.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ncbi.nlm.nih.gov/geo/; Accession No.'s GSE88837, GSE88940 and GSE109597.

ETHICS STATEMENT
All research protocols on this topic have been approved by the Ethics Committee of Guangxi Medical University (LSGXMU-2019-0028). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
NC conceived the study, participated in the design, undertook genotyping, performed the statistical analyses, and drafted the manuscript. LM participated in the design and analyzed the bioinformatics results. WL and DZ conceived the study and helped draft the manuscript. LH, JH, and WS collaborated for the genotyping. LL, YL, and HL carried out the epidemiological survey and collected the samples. SP and JP supervised the whole work. All authors read and approved the final manuscript.