Characterization of DNA Methylation and Screening of Epigenetic Markers in Polycystic Ovary Syndrome

Polycystic ovary syndrome (PCOS) is a heterogeneous endocrine and metabolic disorder in women, which is characterized by androgen excess, ovulation dysfunction, and polycystic ovary. Although the etiology of PCOS is largely unknown, many studies suggest that aberrant DNA methylation is an important contributing factor for its pathological changes. In this study, we investigated DNA methylation characteristics and their impact on gene expression in granulosa cells obtained from PCOS patients. Transcriptome analysis found that differentially expressed genes were mainly enriched in pathways of insulin resistance, fat cell differentiation, and steroid metabolism in PCOS. Overall DNA methylation level in granulosa cells was reduced in PCOS, and the first introns were found to be the major genomic regions that were hypomethylated in PCOS. Integrated analysis of transcriptome, DNA methylation, and miRNAs in ovarian granulosa cells revealed a DNA methylation and miRNA coregulated network and identified key candidate genes for pathogenesis of PCOS, including BMP4, ETS1, and IRS1. Our study shed more light on epigenetic mechanism of PCOS and provided valuable reference for its diagnosis and treatment.


INTRODUCTION
Polycystic ovary syndrome (PCOS) is a common endocrine and metabolic disorder. According to the Rotterdam or Androgen Excess and Polycystic Ovary Syndrome diagnosis criteria, PCOS is characterized primarily by hyperandrogenism and accompanied by hirsutism, androgenetic alopecia, and acne (Fauser et al., 2004;Azziz et al., 2009). Other features include polycystic ovarian morphology (PCOM) and ovulation dysfunction, such as oligo-ovulation and anovulation (Qi et al., 2019). PCOS affects approximately 5-20% of women of reproductive age (Fauser et al., 2004). Girls with hereditary PCOS begin to develop hyperinsulinemia as early as age 4 years, and premature pubarche and menstrual irregularity during adolescence affect their health. Postmenopausal women with PCOS have increased risk of cardiovascular and cerebrovascular diseases (Macut et al., 2017). Furthermore, long-term morbidity of PCOS is associated with various complications. Many patients suffer from metabolic syndrome, which increases the prevalence of type 2 diabetes mellitus (T2DM) (Moran et al., 2010;Gambineri et al., 2012) and gestational diabetes (Pan et al., 2015). Some studies also indicated that women with PCOS are more likely to suffer from depression and anxiety (Dokras et al., 2011). Moreover, recent studies have shown that the gut microbiota is changed in individuals with PCOS (Qi et al., 2019).
Genetic and lifestyle factors contribute to pathogenesis of PCOS (Edgar Ricardo et al., 2019). Maternal hyperandrogenism and insulin resistance can be transmitted to offspring, and obesity induced by imbalance of food intake and energy expenditure can lead to increased androgen and influence the severity of insulin resistance, which in turn contribute to the development of PCOS (Carmina, 2003;Merkin et al., 2016). However, no single gene has been identified as the common etiology of PCOS, and recent studies have turned attention to the epigenetic mechanism of PCOS. Multiple genome-wide studies suggested that epigenetic factors, such as non-coding RNAs and DNA methylation, are closely associated with PCOS (Yu Y.Y. et al., 2015;Mu et al., 2021). microRNA is involved in proliferation, apoptosis, and steroid production of ovarian cells (Sirotkin et al., 2010). The miR-513a-3p is expressed in human ovarian granulosa cells (GCs) and plays a central role in ovarian follicular maturation, ovulation, and maintenance of corpus luteum . However, in PCOS patients, miR-99a and miR-323 target insulin-like growth factor 1 receptor (IGF-1R) and IGF-1, respectively, to regulate GC apoptosis (Geng et al., 2019;Wang et al., 2019). In addition, hundreds of lncRNAs were aberrantly expressed in cumulus cells from PCOS patients (Huang et al., 2016), and CTBP1-AS was identified as an androgen-responsive lncRNA that promotes transcriptional activity of androgen receptor and cell cycle progression (Liu et al., 2017).
DNA methylation, as an important content of epigenetics, plays an important role in pathogenesis of PCOS. Studies showed that DNA methylation is increased in the promoter region of the peroxisome proliferator-activated receptor gamma 1 (PPARGC1A) and represses its expression. Reduced PPARGC1A expression is associated with insulin resistance, high serum androgen levels, and reduction of mitochondrial DNA content in women with PCOS (Zhao et al., 2017). Conversely, DNA methylation level of LHCGR gene promoter is reduced in PCOS, and its overexpression leads to increased LH in GCs, which in turn leads to gonadotropin disorder in PCOS women (Mutharasan et al., 2013). In addition, many studies reported that expression of genes involved in cellular processes, such as lipid and steroid synthesis and sugar metabolism, is altered by abnormal DNA methylation and also contributes to the pathogenesis of PCOS (Salehi Jahromi et al., 2018) (Huang et al., 2007;Kokosar et al., 2016).
In this study, we tried to elucidate the epigenetic regulatory mechanism of PCOS by integrating DNA methylation, transcriptome, and miRNA profile in GCs of PCOS. Our study depicted the DNA methylation characteristic of PCOS. Importantly, we predicted a DNA hypomethylation and miRNA coregulated network in PCOS and identified several marker genes using bioinformatics method. The findings of study provide valuable reference for diagnosis and treatment of PCOS.

Dataset Collection
The data used in this study was downloaded from Gene Expression Omnibus (GEO) (Mao et al., 2021). The RNA-seq data were obtained under accession number GEO: GSE155489. The DNA methylation MBD-seq data were obtained under accession number GEO: GSE138573. The miRNA data sequenced by small RNA-seq were obtained under accession number GEO: GSE138572. The data were generated using GCs obtained from PCOS and normal ovaries. Four duplicate samples of PCOS and control GCs were included in RNA-seq data, respectively. Three duplicate samples of PCOS and control GCs were included in MDB-seq, respectively. Five duplicate samples of PCOS and control GCs were included in miRNA data, respectively.

Differentially Expressed Genes and Differentially Expressed miRNAs Analysis
Genes with averaged FPKM < 0.1 in all individual samples were removed, and the remaining genes were considered as expressed genes. Similarity, only miRNAs with averaged FPKM > 0.1 in all individual samples were used in the analysis (Cao et al., 2014). R package DEseq2 was used in differentially expressed gene (DEG) analysis and differentially expressed miRNA (DEmiR) analysis, and read count was used as input. Genes and miRNAs with absolute log2 fold change (FC) > 1 and q < 0.05 were defined as DEGs and DEmiRs, respectively, where q value is the result of p value correction (Cao et al., 2020;Ping et al., 2020).

Analysis of Differentially Methylated Regions
SMART2 package of Python was used for the differentially methylated region (DMR) analysis. The parameter setting was as follows: CpG Distance: 500, AbsMeanMethDiffer: 0.3, p_DMR: 0.05, Euclidean_Distance: 0.2, Segment_Length: 20, and Methylation_Specificity: 0.5 (Liu H. et al., 2015). DNA methylation level of the gene was represented by average DNA methylation level of CpG segments in gene promoters, and | β| > 0.2 was used to define differentially methylated gene (Gao et al., 2018). Hypermethylation and hypomethylation marks of each group were determined by the one-sample t test in SMART2.

PPI Network Analysis
DNA hypomethylation-affected upregulated genes were identified by integrated analysis of transcriptome and DNA methylation. The human protein interaction pairs (9606.protein.links.v11.0) were downloaded from the STRING database 1 and used as background network. The pairwise interaction genes in the DNA hypomethylation regulated gene modules were defined as seed nodes, and the Pearson correlation coefficient was used as the weight of the edges. By combining the background PPI network and the co-expression networks composed of the seed nodes, two gene co-expression networks of candidate markers were obtained (Mao et al., 2021). MCODE application was used for models mining. The parameters settings of MCODE application were as follows: degree cutoff: 2, node score cutoff: 0.2, K-Core: 2, and maximum depth: 100.

Construction of miRNAs Target Gene Network
The target genes of DEmiRs were identified using TarBase database (7.0) 2 (Vlachos et al., 2015). The intersection of miRNA 2 http://diana.cslab.ece.ntua.gr/tarbase/ target genes and DEGs between PCOS and controls were considered as genes that were regulated by miRNA in PCOS.

Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Pathway Analysis
Functional annotation was performed with the DAVID database 3 (Dennis et al., 2003;Sun et al., 2020). Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) 3 https://david.ncifcrf.gov/home.jsp Frontiers in Cell and Developmental Biology | www.frontiersin.org pathways for each functional cluster were summarized to a representative term, and p values were plotted to show the significance (Bao et al., , 2021.

Transcriptional Profiles of GCs in Human PCOS
PCOS is a metabolic disease whose etiology has not been fully understood (Li S. et al., 2016). Accumulating evidences suggest that abnormal gene expression is one of the main contributing factors for the development of PCOS. In order to take a deeper insight into the potential molecular mechanism of PCOS, we Frontiers in Cell and Developmental Biology | www.frontiersin.org conducted genome-wide transcriptome analysis based on RNAseq data in GCs obtained from PCOS and normal ovaries.
PCA was first performed to investigate the transcriptional difference between GCs of PCOS and normal ovaries ( Figure 1A). The PC1 axis contributed to the main difference between PCOS and normal GCs, accounting for 73%. On this axis, PCOS and normal GCs were split into two separate clusters showing distinct gene expression profiles. The control samples did not gather together on PC2 axis, which may due to the heterogeneity of normal individuals. This finding was consistent with the clustering of sample correlation coefficient analysis (Supplementary Figure 1A). We further identified 470 upregulated genes and 548 downregulated genes in GCs from PCOS (p adj < 0.05 and |log2FC| > 1; Figure 1B). Among them, PCK1 is the rate-limiting enzyme that regulates gluconeogenesis, and CYP1B1 plays an important role in estrogen metabolism, probably corresponding to the phenotype of obesity and hormonal disturbance in PCOS patients, respectively.
Then we performed unsupervised hierarchical clustering analysis of these DEGs and confirmed a significantly different gene expression pattern between PCOS and the controls ( Figure 1C). We further conducted GO analysis and found the DEGs were significantly enriched in biological processes such as steroid metabolic process, response to insulin, female pregnancy, estrous cycle, response to lipopolysaccharide, and fat cell differentiation (Figure 1C), implying that abnormal transcriptional changes in these biological processes may contribute to the development of PCOS. In support of our hypothesis, some genes involved in these biological processes have already been reported to play important roles in PCOS pathogenesis. For example, nuclear receptor subfamily 4 group A member 1 (NR4A1) is upregulated in ovarian GCs and participates in upregulation of androgen in PCOS patients (Song et al., 2019). CEBPD is a leucine zipper transcription factor involved in inflammation and adipogenesis in PCOS (Ma et al., 2020; Figure 1D). Except for the reported findings, we found one set of genes including ID1, ID2, and ID3 was significantly enriched in biological processes of the circadian rhythm and speculated that insomnia in some PCOS patients may due to the abnormal expression of these genes. In order to take a closer look at the changes in signaling pathways, we carried out KEGG analysis and found that the DEGs were enriched in signaling pathways of tumor necrosis factor (TNF), transforming growth factor β (TGF-β), and metabolic and steroid hormone biosynthesis (Figure 1E). These results suggested that the biological processes and signaling pathways in which the DEGs are enriched could contribute to the development of PCOS.

The Methylome Profile of GCs in Human PCOS
Previous studies have shown that epigenetic changes including DNA methylation are crucial for the development of PCOS. To characterize the abnormal DNA methylation in PCOS, we evaluated overall DNA methylation levels and found remarkable hypomethylation in GCs from PCOS compared with the normal GCs (Figure 2A). In order to take a closer look at the potential regulatory mechanism of aberrant DNA methylation in PCOS, we performed DMR analysis using Python package "SMART2" (Figure 2B) and identified hyper-DMRs and hypo-DMRs, respectively ( Figure 2C). The hypo-DMRs accounted for the majority of DMRs (n = 495), whereas there were only 25 hyper-DMRs that account for 5% of the DMRs. These results indicated that hypomethylation is a key characteristic of PCOS and prompted us to focus on it. GO analyses found that the hypo-DMRs genes were mainly enriched in GO terms of adipose tissue development, glucose homeostasis, and pancreas development, which correspond to the characteristics of acne, obesity, and insulin resistance in PCOS patients (Figures 3A,B). In the KEGG enrichment analysis, the hypo-DMRs genes were found to be mainly involved in insulin secretion, diabetes, dopaminergic synapse, and thyroid hormone signaling pathways, which may associate with the diabetes and hormonal disorders in PCOS patients. Taken together, hypo-DMR plays a critical role in PCOS by altering gene expression in important biological processes and signaling pathways.
In order to investigate the genomic regional preference, the DMRs were mapped to the whole genome. Distribution map of DMRs showed that hypo-DMRs were more prevalent in promoter region near TSS than hyper-DMRs ( Supplementary  Figures 1C,D). Interestingly, besides the longer genomic regions such as the distal intergenic regions and other intron regions, the DMRs were predominantly located in the first intron. This suggested that DNA hypomethylation in the first intron probably plays an important role in regulating gene expression in PCOS (Figures 3C,D).

Integrated Analysis of DNA Methylation and Transcriptome in PCOS
To further explore the role of aberrant DNA methylation in PCOS, we carried out an integrated analysis of the transcriptome and DNA methylome. The genes affected by abnormal DNA methylation were identified by comparing changes in gene expression and methylation levels between PCOS and the normal GCs. Genes with significant changes in DNA methylation levels (abs change >0.2) and expression levels (abs log2 FC > 1) were defined as "methylation-affected genes" (Figure 4A). All genes were divided into methylation-affected genes with either repressed or activated expression. As DNA hypomethylation is an important characteristic of GCs in PCOS and DNA methylation inversely associates with gene expression, we identified a subset of hypomethylation-affected upregulated genes (n = 94) and calculated the Pearson correlation coefficient between gene pairs and performed unsupervised hierarchical clustering ( Figure 4A). Then, two closely related gene modules with strong correlations were generated for subsequent analysis ( Figure 4B). Thus, our analysis identified a number of important PCOS-related genes whose expression level was regulated by DNA hypomethylation.

Analysis of Gene Coexpression Network
Based on the pairwise interaction genes in the aforementioned two modules ( Figure 4B) and human-protein interaction pairs (9606.protein.links.v11.0) downloaded from the STRING Genes with significant changes in DNA methylation levels (abs change >0.2) and expression levels (abs log2 fold change >1) were defined as "methylation-affected genes." Hypermethylation-affected downregulated genes were labeled in blue; Hypomethylation-affected upregulated genes were labeled in red. Frontiers in Cell and Developmental Biology | www.frontiersin.org database, two-gene coexpression interaction networks were constructed by Cytoscape (Figures 4C,D). Based on topological properties of the networks, genes with high degree of node (hub node genes) were identified as candidate genes. Then, three small modules composed of 10 highly connected genes were extracted from the two networks using the Cytoscape application "MCODE" (Figures 4C,D). The genes in these modules were considered as marker genes regulated by hypomethylation, including BMP4, KLF5, PER1, EST1, CRY1, and FAT4. Some of the candidate genes have been reported to play key roles in PCOS such as ETS1 and NR4A1 (Kasch et al., 2018;Song et al., 2019). Among the other genes, BMP4 regulates conversion of white and brown fat and is closely related to the occurrence of T2DM (Hoffmann et al., 2017); the circadian gene PER1 senses progesterone signal during human endometrial decidualization . These genes may be involved in the development of PCOS by regulating insulin metabolism, adipocyte differentiation, and circadian biological processes. In general, we constructed two-gene coexpressed networks of PCOS and identified several PCOS-related target markers regulated by DNA hypomethylation.

The miRNAs Profiles of GCs in Human PCOS
Abnormal expression of miRNAs in female reproductive organs such as uterus, fallopian tubes, and ovaries is involved in pathological changes of the organs (Bagga et al., 2005). To illustrate the regulatory functions of miRNAs in PCOS, we compared miRNA profiles of GCs in PCOS and normal ovaries. We identified 19 upregulated miRNAs and 10 downregulated miRNAs in PCOS, respectively (p adj < 0.05 and |log2FC| > 1; Figure 5A). Unsupervised hierarchical clustering analysis of the DEmiRs showed distinct expression patterns in the PCOS and control GCs (Figure 5B). Of note, miR-141-3p was one of the key miRNA significantly downregulated in PCOS, which was reported to inhibit cell proliferation and promotes apoptosis (Lin et al., 2014). Consistent with previous report in cumulus cells , miR-508-3p was upregulated in GCs of PCOS patients. In order to further explore the function of miRNAs in PCOS, we performed GO and KEGG enrichment analysis on the target genes of the DEmiRs. The target genes of the DEmiRs were mainly enriched in GO terms of circadian rhythm, apoptotic process, cell proliferation, and lipopolysaccharide. In the KEGG enrichment analysis, the target genes were mainly involved in circadian rhythm, TGF-β signaling pathway, and TNF signaling pathway (Figures 5C,D). These results indicated that the DEmiRs contribute to the development of PCOS by regulating some key biological processes and signaling pathways.
It is well known that miRNA can lead to the degradation or translational inhibition of mRNA (Bartel, 2004). Taking the intersection of miRNA target genes and DEGs between PCOS and controls, we identified genes that were most likely regulated by miRNA in PCOS, including upregulated miRNA target genes (n = 106) and downregulated miRNA target genes (n = 56), respectively (Supplementary Figure 1E). In order to elucidate possible regulatory function of miRNAs in gene expression, we performed miRNA-gene interaction networks analysis using Cytoscape (Figures 5E,F). Some of the genes in the networks, such as GDF15, INSIG2, have already been reported to be involved in PCOS. For example, GDF15 is closely related to insulin resistance, hyperandrogenemia, and menstrual disorder in PCOS (Berberoglu et al., 2015). Apart from the reported genes, we newly identified one set of genes that were represented by CD44, IRSI, CYP1B1, and HMGA1. These genes play important roles in ovarian function and androgen metabolism and are likely to be involved in pathogenesis of PCOS. For example, the deficiency of endometrial epithelial CD44 adhesion complex contributes to the endometrial infertility (Paravati et al., 2020) and likely to be involved in the pathogenesis of PCOS by affecting ovarian function. In conclusion, we constructed the miRNAs regulatory networks of PCOS and identified several important target genes regulated by aberrant miRNAs. Importantly, we found 13 genes including BMP4, ETS1, IRS1, FGFR1, CYP1B1, and KLF5, which were coregulated by downregulated miRNAs and hypo-DNA methylation ( Figure 6A). Some of the genes such as ETS1 and FGFR1 have been reported to participate in the development of PCOS (Song et al., 2019;Patil et al., 2020). Among the other genes, KLF5 is an important transcription factor that regulates androgen-AR signaling ; CYP1B1, a dioxin-inducible oxidoreductase, is involved in the metabolism of estradiol (Muneeb et al., 2014); IRS1 is an insulin receptor substrate gene that mediates the control of various cellular processes by insulin (Kasch et al., 2018;Park et al., 2018). These genes were significantly enriched in biological processes of adipocyte differentiation, insulin resistance, female pregnancy, and circadian rhythm, indicating they are likely to be involved in pathogenesis of PCOS. Taken together, our analysis predicted a DNA methylation and miRNA-mediated coregulation profile in PCOS ( Figure 6B).

DISCUSSION
Accumulating evidences indicated that epigenetic alterations occur in the peripheral and umbilical cord blood, as well as in ovary, adipose tissue, and skeletal muscle in women with PCOS (Sirotkin et al., 2010;Roth et al., 2014;Edgar Ricardo et al., 2019). These epigenetic alternations are correlated with systemic and tissue-specific dysfunctions in PCOS, highlighting their importance in PCOS pathogenesis (Edgar Ricardo et al., 2019). As GCs establish a very close relationship with the female gametes even before oogonia differentiation and better represent molecular characteristics of PCOS, we focused on ovarian GCs to investigate the epigenetic regulatory mechanism of PCOS.
Many previous studies suggested that lifestyle and genetic factors contribute to the development of PCOS (Li S. et al., 2016). However, the etiology of PCOS remains unclear. In this study, we investigated DNA methylation characteristics and screened potential epigenetic markers. We identified several key genes, such as IRS1, KLF5, CYP1B1, ETS1, and BMP4, and some signaling pathways including steroid metabolic process, response to insulin, female pregnancy, estrous cycle, and fat cell  differentiation. The alternation of these genes' expression and signaling pathways could contribute to the symptoms of PCOS, such as the type 2 diabetes, infertility, hormonal disorders, and obesity (Combs et al., 2021;Mao et al., 2021). Additionally, we found some DEGs were significantly enriched in circadian rhythm signaling pathway (Figure 1E), suggesting that insomnia observed in some of the PCOS patients might be due to the expression changes of these genes.
Change in DNA methylation is involved in various diseases processes, and aberrant DNA methylation of CYP17, CEBPB, PPARG, and SVEP1 genes has been reported in PCOS (Huang et al., 2007;Kokosar et al., 2016). Consistent with previous studies, we found that overall DNA methylation is reduced in GCs of PCOS  and further identified several marker genes that are regulated by the hypo-DNA methylation, including BMP4, KLF5, IRS1, LPIN1, and ABCC8. Among them, LPIN1 plays a critical role in adipocyte differentiation and lipid metabolism (Chang et al., 2010). Pathogenic variants of ABCC8 are the most common genetic cause of neonatal diabetes and hyperinsulinism (De Franco et al., 2020). Furthermore, these genes were enriched in biological processes of insulin and lipid metabolism, and dysregulation of these genes may contribute to the development of PCOS. Interestingly, the first intron was found to be the main genomic region where DNA hypomethylation happened. As DNA methylation of the first intron inversely associates with gene expression regardless of tissue and species (Anastasiadi et al., 2018), reduction of DNA methylation in the first intron could lead to upregulation in gene expression and disturb normal biological process. This indicated that the first intron was an important genomic functional region whose change in DNA methylation likely contributes to the pathogenesis of PCOS.
Abnormal activation or inhibition of signal pathway is closely related to the development of PCOS. TGF-β signaling pathway and mitogen-activated protein kinase (MAPK) signaling pathway have been reported to be involved in PCOS . In our study, we found the DEmiRs were enriched in TGFβ signaling pathway, MAPK signaling pathway, and circadian rhythm. These findings collectively suggested that TGF-β and MAPK signaling pathway are crucial for the development of PCOS and supported our proposal that miRNAs regulation is an important layer of regulatory machinery in PCOS. miR-141-3p was reported as an important DEmiR in the rat model of PCOS ) and found to regulate target genes such as KLF5, IRS1, and CYP1B1 in GCs of PCOS patients in our study ( Figure 5F). Coincidentally, DNA hypomethylation happens on these target genes, and their expression was upregulated (Figure 6A), indicating a coregulatory mechanism of DNA hypomethylation and miRNA in the pathogenesis of PCOS.
In conclusion, our study indicated that DNA hypomethylation is one of the main characteristics of PCOS, and the first intron was found to be the key genomic elements where DNA hypomethylation was observed, indicating its active involvement in the pathogenesis of PCOS. Importantly, we predicted a DNA hypomethylation and miRNA coregulated network in PCOS and provided several candidate target genes including BMP4, CYP1B1, IRS1, ETS1, and LPIN1. These genes participate in important signaling pathways and biological process and potentially serve as molecular targets for diagnosis and treatment of PCOS. However, our findings are based on genome-wide data analysis and need further validation in experimental models. As developmental competence of oocyte is greatly impaired in PCOS (Alexandria and Jennifer, 2019), it is interesting to investigate the underlying molecular mechanism of oocyte dysfunction in PCOS and clarify its impact on offspring in the future.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

AUTHOR CONTRIBUTIONS
BN and PC conceived the study. PC performed the bioinformatics analysis. BN wrote the manuscript with assistance from PC, WY, PW, and XL. All authors contributed to the article and approved the submitted version.