An Integrated Regulatory Network Based on Comprehensive Analysis of mRNA Expression, Gene Methylation and Expression of Long Non-coding RNAs (lncRNAs) in Myelodysplastic Syndromes

Myelodysplastic syndromes (MDS) are a heterogeneous group of disorders characterized by ineffective hematopoiesis, defective differentiation of hematopoietic precursors, and expansion of the abnormal clones. The prevalence of MDS has raised great concerns worldwide, but its pathogenetic mechanisms remain elusive. To provide insights on novel biomarkers for the diagnosis and therapy of MDS, we performed high-throughput genome-wide mRNA expression profiling, DNA methylation analysis, and long non-coding RNAs (lncRNA) analysis on bone marrows from four MDS patients and four age-matched healthy controls. We identified 1,937 differentially expressed genes (DEGs), 515 methylated genes, and 214 lncRNA that showed statistically significant differences. As the most significant module-related DEGs, TCL1A, PTGS2, and MME were revealed to be enriched in regulation of cell differentiation and cell death pathways. In addition, the GeneGo pathway maps identified by top DEGs were shown to converge on cancer, immunoregulation, apoptosis and regulation of actin cytoskeleton, most of which are known contributors in MDS etiology and pathogenesis. Notably, as potential biomarkers for diagnosis of MDS, four specific genes (ABAT, FADD, DAPP1, and SMPD3) were further subjected to detailed pathway analysis. Our integrative analysis on mRNA expression, gene methylation and lncRNAs profiling facilitates further understanding of the pathogenesis of MDS, and may promote the diagnosis and novel therapeutics for this disease.

Myelodysplastic syndromes (MDS) are a heterogeneous group of disorders characterized by ineffective hematopoiesis, defective differentiation of hematopoietic precursors, and expansion of the abnormal clones. The prevalence of MDS has raised great concerns worldwide, but its pathogenetic mechanisms remain elusive. To provide insights on novel biomarkers for the diagnosis and therapy of MDS, we performed high-throughput genome-wide mRNA expression profiling, DNA methylation analysis, and long non-coding RNAs (lncRNA) analysis on bone marrows from four MDS patients and four age-matched healthy controls. We identified 1,937 differentially expressed genes (DEGs), 515 methylated genes, and 214 lncRNA that showed statistically significant differences. As the most significant module-related DEGs, TCL1A, PTGS2, and MME were revealed to be enriched in regulation of cell differentiation and cell death pathways. In addition, the GeneGo pathway maps identified by top DEGs were shown to converge on cancer, immunoregulation, apoptosis and regulation of actin cytoskeleton, most of which are known contributors in MDS etiology and pathogenesis. Notably, as potential biomarkers for diagnosis of MDS, four specific genes (ABAT, FADD, DAPP1, and SMPD3) were further subjected to detailed pathway analysis. Our integrative analysis on mRNA expression, gene methylation and lncRNAs profiling facilitates further understanding of the pathogenesis of MDS, and may promote the diagnosis and novel therapeutics for this disease.

INTRODUCTION
Myelodysplastic syndromes (MDS) are a group of hemopathies featured with various degrees of ineffective hematopoiesis, and confer the host with inherent risk of progression to acute myeloid leukemia (1)(2)(3)(4). Although actual epidemiology are unknown, the incidence of MDS is increasing, attributing to the growing aging population, the use of cytotoxic agents in treating diseases, and environmental carcinogens such as organic solvents (1,(5)(6)(7). MDS affect mainly elderly patients. Based on previous literatures, the annual incidence of MDS is ∼3.5-10 per 100,000 in the general population, while 12-50 per 100,000 in elderly population (3,4,8).
Due to the heterogeneity of MDS, the molecular pathogenesis of MDS remains poorly understood, which has significantly hindered the development of new strategies to improve the therapies (7). Although the molecular characterization of MDS continues to be controversial, various studies demonstrated the clonal involvement of the myeloid lineages. Importantly, FIGURE 1 | Flowchart of high-throughput analysis for data integration. High-throughput genome-wide DNA methylation, mRNA, and lncRNAs expression profiling were performed on BM samples from four MDS patients and four age-matched healthy controls. DEGs, differentially methylated genes and differentially expressed lncRNAs were initially screened out. miRNAs were identified using regulatory data collected from the microcode database. Target genes were predicted by six databases including the mirTARbase, Tarbase, TargetScan version 6.2, miRBase 18, Diana version 4.0, and targetMiner. All the DEGs, differentially methylated genes, and differentially expressed lncRNAs-predicted targets were integrated to construct the regulatory network of MDS. the clonal mutation and expansion of abnormal myeloid cells are tightly associated with the dysfunction of pluripotent or multipotent hematopoietic cells, which may establish the MDS phenotype and determine natural disease course. The fundamentally complex genetic and biologic abnormalities in pathogenesis are also demonstrated by the heterogeneity of the clinical and morphologic pictures of MDS (8). Even now, the etiology and pathogenesis of MDS remain inadequately characterized.
Gene expression profiling can identify key deregulated genes and pathways and new prognostic gene signatures in MDS. Recent advances in the molecular pathogenesis of MDS are leading to new biological, clinical, and therapeutic insights. Over the course of the past decade, the application of novel high throughput technologies to the study of MDS has led to the identification of recurrently mutated genes in this disease. For example, microarray-based gene expression profiling has expedited the identification of many differentially expressed genes in MDS, and also underscored several critical gene pathways deregulated in this hematopoietic malignancy (9). Comprehensive genomic profiling of MDS and acute myeloid leukemia (AML) cases have remarkably enabled the detection of genes functioning as drivers of differentiation and subclonal mutations, which could profoundly facilitate the timely diagnosis, accurate risk prognostication and targeted therapies (10).
The aim of this study is to construct an integrated, MDSspecific regulatory network, further providing novel MDS signature genes, and pathways that can be translated to potential novel biomarkers or approaches for the diagnosis and therapy of MDS. We performed high-throughput genome-wide mRNA expression profiling, DNA methylation analysis, and lncRNA analysis on bone marrows from four MDS patients and four age-matched healthy controls. Genomic profiling was used for construction of an integrated regulatory network. Additionally, GO analysis and pathway enrichment analysis for MDS were performed. Notably, four MDS-specific genes identified as potential biomarkers in diagnosis of MDS were analyzed in detail by pathway analysis. Our findings provide complementary understanding of MDS pathogenesis, and lead to a better insight into the development of novel strategies for diagnosis and therapy of this disease.

Patients and Samples
A cohort of four MDS patients and four healthy controls were included in this integrative parallel study on high-throughput global gene profiling, as previously reported (11). The subtype of four MDS patients was refractory anemia with excess blasts (RAEB), including three male and one female, and the median age was 67 years. Diagnosis of the MDS patients was based on the criteria defined by the International System for Human Cytogenetic Nomenclature. Bone marrow (BM) mononuclear cells samples were isolated by Ficoll solution (GE Healthcare) according to the manufacturer's instructions. Ethical clearance and approval was obtained from the local institutional research board (Code, 2009M-012). The study protocol was carefully explained to the participants and participation was fully voluntary. Written informed consent was obtained from all participants and they agreed to publish their individual data. All procedures were done according to the standard with a minimum risks.

DNA Extraction and Microarray-Based Genomic Profiling
Genomic DNA was isolated using a QIAamp DNA Blood Mini Kit (Qiagen) from BM samples according to the standard procedures. The Infinium BeadChip DNA methylation array was carried out for profiling of human whole genome genes, includes CpG sites, CpG islands and non-coding RNA, as previously reported (11). A total of 27,958 Entrez Gene RNAs and 7,419 lincRNAs were included in the 60K BeadChips for human genome-wide gene expression (Agilent). Original data have been deposited in Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information, with public online access links (GSE51758 and GSE51757).

Differentially Expressed Genes (DEGs) Screening
R software was used to process the microarray data (12). Limma (Linear Model for Microarray) package in R was used to identify DEGs between MDS patients and control group (13). After the data was normalized by log2 transformation, the Bayes moderated t-test (4) was recruited for the identification of DEGs. The screening criterion for DEGs was set up as the FDR (false discovery rate) <0.05 and |log FC (fold change)| >1.

Construction of Integrated Regulatory Network
Systematic integration of various high-throughput datasets was performed for the construction of integrated regulatory network. Targets of one microRNA (miRNA) with strong experimental evidence were obtained in the mirTARbase and DIANA-TarBase v8 as known targets. TargetScan version 6.2, miRBase version 18, Diana version 4.0 and targetMiner were used to aggregate potential targets sets. DEGs predicted by more than three tools were selected. The Cytoscape software (http://cytoscape.org/) Frontiers in Oncology | www.frontiersin.org was employed for the final assembly of the integrated gene regulatory network.

GO and KEGG Enrichment Analyses of DEGs
DAVID (https://david.ncifcrf.gov/) was employed to perform Gene Ontology (GO) enrichment analysis, based on the hypergeometric distribution method. The criteria of count number ≥2 and P-value < 0.05 were selected as the threshold for GO enrichment. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was performed, as previously described (https://www.genome.jp/kegg/), to identify the main functional pathways in DEGs involved in metabolism of MDS. The most significantly enriched pathway-related DEGs Frontiers in Oncology | www.frontiersin.org were mapped to the corresponding KEGG pathways. The thresholds were set as count number <3 and P-value < 0.05.

A Network Framework of Data Integration
By analyzing the microarray data from four MDS and four control samples according to designed flowchart (Figure 1) Using regulatory data collected from the microcode database, we identified 72 microRNAs. Subsequently, 1,348 genes were predicted as potential targets of these 72 mircoRNA through six databases: mirTARbase, Tarbase, TargetScan version 6.2, miRBase 18, Diana version 4.0, and targetMiner. Then we focused on these genes for further analysis to construct an integrated MDS regulatory network.

Construction of Global Gene Profiling Based Integrated Regulatory Network for MDS
By integrating a large number of hypermethylated genes, DEGs, DEG-related miRNAs, transcription factors (TFs) and lncRNAs collected from the above-mentioned six databases, a regulatory network was built. As shown in Figure 3, this network included three major network components: TF-gene-lncRNA regulatory network, TF-miRNA-lncRNA regulatory network and miRNAgene-methylation regulatory network.

GO Analysis of Module-Related DEGs
In order to biologically evaluate the MDS-specific gene regulatory network, we carried out GO analysis. Specifically, we evaluated our network based on functional knowledge about genes that are involved in similar biological processes. We found that all biological processes are internally connected. While the number for maximally overlapped DEGs of two biological processes was 130, the number for minimally overlapped DEGs of two biological processes was one. To gain further insight into the function of DEGs in our network modules, module-related DEGs were annotated by the online biological classification software DAVID. Some target genes are statistically enriched in the GO terms of apoptosis and immune response. As shown in Table 1, the top ten enriched GO terms include: intracellular signaling cascade, regulation of apoptosis (Figure 4), phosphate metabolic process, immune response, defense response, regulation of cell proliferation, positive regulation of biosynthetic process, phosphorylation, regulation of transcription from RNA polymerase II promoter, and biological adhesion.

Pathway Analysis of Module-Related DEGs
We also identified the KEGG pathways that are significantly enriched in MDS patients, and many of these pathways are associated with cancer. The top ten enriched KEGG pathways include: pathways in cancer, MAPK signaling pathway, chemokine signaling pathway (Figure 5), regulation of actin cytoskeleton, focal adhesion, endocytosis, leukocyte transendothelial migration, natural killer cell mediated cytotoxicity, neurotrophin signaling pathway, and insulin signaling pathway ( Table 2). The top GeneGo pathway maps regulated by DEGs converged on immunoregulation, apoptosis and cell adhesion, most of which are known to be contributors in MDS etiology and pathogenesis.
In order to better understand the mechanisms underlying the interactions between the pathway-related DEGs, we mapped the DEGs in the above-mentioned pathways to their corresponding KEGG pathways ( Figure 5). As shown in the map, for example, the Raf-1 proto-oncogene RAF1 was involved in more than 20 pathways, including apoptosis and regulation of cell migration. The genes STAT and PAK1 were simultaneously involved in more than two pathways.

In-depth Pathway Analysis of Four Candidate DEGs for MDS Diagnosis
Our previous work has demonstrated that there were six DEGs identified as potential biomarkers for diagnosis of MDS, including ABAT, DAPP1, FADD, LRRFIP1, PLBD1, and SMPD3 (11). In order to better verify the potential diagnostic value and confirm their values as potential therapeutic targets, we performed a detailed in-depth pathways analysis for these six DEGs. However, as shown in Table 3, only four genes (ABAT,

DISCUSSION
MDS are a heterogeneous group of hematopoietic neoplasms characterized by defective differentiation of hematopoietic stem cells and bone marrow dysplasia (1). These characteristics limit the deeper going into its etiology and pathogenesis. The outcomes of MDS patients have not substantially improved over the last few decades, due to the lack of a comprehensive understanding of the pathogenesis. In this study, we completed the construction of an integrated MDS regulatory network, which not only facilitates the further understanding of the pathogenesis of MDS, but also can promote the diagnosis and novel therapeutics for this disease. Currently, the application of new high-throughput technologies is promising to break the frontiers in the field of MDS etiology and pathogenesis. It has been proved that the network analysis is useful in unraveling the complexity of biological regulations (14)(15)(16)(17)(18)(19)(20)(21)(22). For instance, aggressive behavior of breast cancer is related to cell proliferation and hormone stimulus via an integrated gene regulatory network (23). Cancers share a common regulatory network involving histones and regulators of the cell cycle and immune responses by mining gene expression data, which have been demonstrated by Knaack et al. (24). Lots of mutated genes and driver pathways are important for the pathogenesis of MDS using RNA-seq technology (7).  In addition, Tawana et al. believe that we will have a better understanding of the role of an increasing number of inherited genetic factors on MDS/AML incidence and management through high-throughput gene exploring (10). Our global genome profiling between MDS patients and healthy controls resulted in more down-regulated DEGs  (16), and others may contribute to the hallmarks of morphologic dysplasia observed in MDS, due to apoptosis, regulation of the cytoskeleton and immunoregulation (7). LncRNAs have been involved in the regulation of a variety of biological functions in both physiological and pathogenic conditions. Many researchers have detected aberrantly expressed lncRNAs in human cancers (18,(25)(26)(27)(28)(29)(30) and found that these lncRNAs are implicated in the regulation of many critical oncogenes or tumor suppressor genes, such as p53 (31). By targeting other genetic and epigenetic regulators, lncRNAs can also regulate the transcriptional and translational output in cells (32). Many studies have reported that lncRNAs have been identified as important participants in the pathogenesis and disease progression of AML. So it seems possible that profiling of their expression would have significant impact on MDS diagnosis and prognosis. Yao et al. provide evidence that higher expression levels of four lncRNAs in their scoring system are associated with distinct clinical and genetic features, and they conclude that the high abundance of these lncRNAs could predict inferior outcomes of MDS patients (33). Moreover, cancer associated lncRNAs may serve as diagnostic or predictive biomarkers, and also imply new therapeutic strategies focusing on selectively silencing these lncRNAs in treating cancers (34). In this study, we identified 214 lncRNAs, and 28 of them had the record of transcripts. However, further in-depth studies are still required to thoroughly reveal the functions of these lncRNAs and their significances in MDS.
Epigenetic alterations is widely accepted as a common event in carcinogenesis, and aberration of epigenetic regulation may be considered functionally equivalent to classical genetic alterations (11). DNA methylation has an important role in hematological malignancies, and accumulating data indicate that epigenetic modifications are attractive therapeutic targets in any oncopathological state (35). We identified 515 methylated MDSspecific genes in our investigation on global gene profiling. In our previous work, we defined six genes (ABAT, DAPP1, FADD, LRRFIP1, PLBD1, and SMPD3) as CIMP (the CpG island methylator phenotype) markers of MDS, which suggests a non-invasive approach for the diagnosis and prognosis of MDS (11). In this study, we further analyzed the data to pinpoint miRNA, lncRNAs associated with these six genes. Four genes (ABAT, DAPP1, FADD, and SMPD3) were identified to possess their functional pathways. Further studies will be carried out by our research group to unveil their significances in pathogenesis of MDS. Through using high-throughput technologies in this era, the incorporation of more relevant parameters would undoubtedly move us one step further toward the diagnosis and therapy of MDS.

CONCLUSION
In summary, we performed microarray on BM samples from four MDS patients and four age-matched healthy controls. In order to construct an integrated regulatory network to uncover potential novel biomarkers for the diagnosis and therapy of MDS, we undertook multiple analyses, including high-throughput genome-wide mRNA expression profiling, DNA methylation analysis, and lncRNAs analysis. Additionally, GO analysis and pathway enrichment analysis for MDS were performed. Meanwhile, six MDS-specific genes previously identified as potential biomarkers in MDS diagnosis were re-analyzed in detail. Our findings provide valuable resources for in-depth understanding the pathogenesis of MDS, and also lead to better insights on developing novel strategies for diagnosis and therapy of this disease.

AUTHOR CONTRIBUTIONS
JL and XW designed the experiments. XZ and HY carried out most of the experiments and analyzed the data. NL, YZ, WS, SQ, and GH assisted in analyzing the data. XZ wrote the article.

FUNDING
This work was supported by grants from National Natural Science Foundation of China (81600096, 81570141, 81522001).