The Joint Analysis of Multi-Omics Data Revealed the Methylation-Expression Regulations in Atrial Fibrillation

Atrial fibrillation (AF) is one of the most prevalent heart rhythm disorder. The causes of AF include age, male sex, diabetes, hypertension, valve disease, and systolic/diastolic dysfunction. But on molecular level, its mechanisms are largely unknown. In this study, we collected 10 patients with persistent atrial fibrillation, 10 patients with paroxymal atrial fibrillation and 10 healthy individuals and did Methylation EPICBead Chip and RNA sequencing. By analyzing the methylation and gene expression data using machine learning based feature selection method Boruta, we identified the key genes that were strongly associated with AF and found their interconnections. The results suggested that the methylation of KIF15 may regulate the expression of PSMC3, TINAG, and NUDT6. The identified AF associated methylation-expression regulations may help understand the molecular mechanisms of AF from a multi-omics perspective.


INTRODUCTION
Atrial fibrillation (AF), one of the most prevalent heart rhythm disorders, is a potential cause of heart failure and ischemic stroke with high morbidity and mortality (Ogawa et al., 2017;Asmarats et al., 2019). The cause of AF is multifactorial which include age, male sex, diabetes, hypertension, valve disease, and systolic/diastolic dysfunction (Schnabel et al., 2009;Lip et al., 2013;Voukalis et al., 2016). Depends on how often atrial fibrillation occurs and how it responds to treatment, AF is roughly divided into two major subtypes-paroxysmal atrial fibrillation (PAF) and persistent atrial fibrillation (PeAF). In the treatment of AF, drugs were the first choice, non-drug therapies were used only when drug therapy failed or patients could not tolerate the medication. In contrast to the extensive knowledge of etiology, the underlying mechanism of AF remains elusive. Further study of the potential mechanisms of AF could provide novel strategies for the treatment and management to increase quality of life and reduce economic burden of social (Chugh et al., 2001).
Epidemiological studies have demonstrated that genetic, environmental, behavioral, and clinical factors contribute to AF pathogenic mechanism (Zhong et al., 2016). Emelia J. B performed genome-wide methylation using whole blood samples from 183 prevalent AF and 220 incident AF cases. They examined the association between DNA methylation and GWAS loci, suggesting DNA methylation might be a possible mechanism through which AF-specific genetic variations affect gene regulation (Lin et al., 2017). To date, only a few studies have investigated differential DNA methylation as a predictor biomarker at specific candidate loci that were previously associated with AF.
Therefore, we applied DNA methylation profiling study to identify the likely rare damaging variants and putative candidate genes from 10 patients with persistent atrial fibrillation (PeAF), 10 patients with paroxymal atrial fibrillation (PAF) and 10 healthy individuals. Interestingly, we identified top 10 genes (KIF15, ABCA3, FOXG1, VGF, PDE4D, EIF3C, CNTNAP5, SHOX2, VGF, TRIM59) as functional candidate genes and the expression level are significantly increased in PeAF and PAF patients than control. Given the importance of DNA methylation to gene expression, we investigated the gene expression of the same participants using RNA sequencing. We also defined top 10 genes (EPN3, EMD, SMCO4, F2RL2, TMED1, PSMC3, PDZD11, NUDT6, TINAG, GALNT5) in gene expression data and the expression pattern of these genes was significantly different between PeAF and PAF. These results have improved our understanding of the underlying mechanism and offer new insights into the potential pathway of AF, which could provide novel therapeutic option for this disease.

Atrial Fibrillation Patients
Ten patients with paroxymal atrial fibrillation (g1), 10 patients with persistent atrial fibrillation (continuous atrial fibrillation lasting more than 12 months) (g2) and 10 healthy individuals (g3) were enrolled in this study ( Table 1). All patients were subjected to detailed medical evaluation, which included medical history, physical examination, electrocardiography (ECG), and echocardiography. Patients with chronic heart failure, coronary heart disease, cardiomyopathy, hyperthyroidism or chronic pulmonary heart disease were excluded.
The study was conducted in accordance with the Declaration of Helsinki, and the protocol used to collect human heart tissue was approved by the Ethics Committee of Shanghai East Hospital (DI: 0402017).
Written informed consents to participate in this study were provided by all the enrolled patients before operation of fibrillation ablation. The left atrial appendage tissues which were abandoned during isolated surgical ablation were collected. Normal left atrial appendages were collected from healthy male donors.

The Methylation Profiles
The DNA methylation status of 850K probes in the 30 samples was measured using Methylation EPICBead Chip. The raw data was quality controlled and preprocessed using R/Bioconductor package minfi 1 (Aryee et al., 2014). The beta value ranged from 0 to 1 was calculated to represent how each position was methylated. 1 meant high methylation and 0 meant low methylation.

The RNA Sequencing Profiles
The total RNAs were extracted using RNeasy Mini Kit (Cat#74106, Qiagen) and the RNA integrity was checked using Agilent Bioanalyzer 2100 (Agilent technologies, Santa Clara, CA, United States). Qualified total RNA was further purified by RNAClean XP Kit (Cat A63987, Beckman Coulter Inc., Kraemer Boulevard, Brea, CA, United States) and RNase-Free DNase Set (Cat#79254, QIAGEN, GmBH, Germany). Pair-end sequencing reads were generated using Illumina data collection software. First, the reads were mapped onto human reference genome GRCh38 using Hisat2 (version:2.0.4 2 ) (Kim et al., 2015). Then, Stringtie (version:1.3.0 3 ) (Pertea et al., 2015) was used to calculate the FPKM (Fragments Per Kilobase of exon model per Million mapped reads).

Feature Selection Algorithm
There were 866,091 methylation probes and 50,868 RNAs. The number of features were extremely large. It was difficult to select key features using traditional statistical methods. Therefore, we adopted the latest machine learning based feature selection method Boruta to get the key methylation probes and RNAs.
Boruta (Kursa and Rudnicki, 2010) is a feature selection method based on random forest. It can select sample group relevant features effectively. First, it will shuffle the features to create many permuted datasets. Then, it will evaluate the importance score of each feature in the original actual dataset and permuted datasets. Then, it will compare the actual importance score with permuted scores and find the features with significantly higher actual importance scores than permuted scores. After multiple iterations, it will select all the sample group relevant features. The python code from https://github. com/scikit-learn-contrib/boruta_py was used to apply the Boruta feature selection algorithm.

Classification Predictor
To evaluate how well the selected features can classify the samples, we built an SVM (Support Vector Machine) classifier using the methylation data and another RNA-Seq data based SVM classifier. The svm function in R package e10171 4 (Chang and Lin, 2011) was used to apply the SVM classification algorithm. LOOCV (leave-one-out cross validation) was used to objectively evaluate the classification performance. Each time, one sample was treated as test sample while all the other samples were used to train the model. After 30 rounds, all samples had been tested once and the overall accuracy was calculated based on the confusion matrix. In confusion matrix, the actual sample groups were compared with predicted sample groups.

The Key Methylation Features Identified With Boruta
We ran Boruta feature selection algorithm on the methylation data and got 10 key methylation features. These 10 key methylation features were listed in Table 2. The probes were annotated to genome positions (Genome Build 37) and genes using the official annotation file from Illumina. Sometime, one position may be associated with multiple genes. Therefore, the 10 methylation probes can be mapped onto 15 gene symbols. 4 https://CRAN.R-project.org/package=e1071 We checked the GO annotation of these genes and found that cg16703882 (SHOX2) was associated with GO:0007507: heart development which was closely relevant to AF.
We plotted the heatmap of these 10 key methylation features in Figure 1. It can be seen that most of the 10 patients with paroxymal atrial fibrillation (g1), 10 patients with persistent atrial fibrillation (g2) and 10 healthy individuals (g3) were cluster into the right groups.

The Key Gene Expression Features Identified With Boruta
Similarly, we ran Boruta feature selection algorithm on the RNA-Seq gene expression data and got 10 key gene expression features. Frontiers in Bioengineering and Biotechnology | www.frontiersin.org FIGURE 1 | The heatmap of the key methylation features. The 30 samples were from three groups: 10 patients with paroxymal atrial fibrillation (g1), 10 patients with persistent atrial fibrillation (g2), and 10 healthy individuals (g3). It can be seen that most samples were cluster into the right groups.
The 10 key gene expression features were given in Table 3. We also plotted their heatmap in Figure 2. The clusters were also largely correct.

The Classification Performance of Methylation and Gene Expression
From Figures 1, 2, we can see that both methylation and gene expression features can correctly cluster most samples. But we would like to evaluate their performance objectively and quantitatively. Therefore, we applied LOOCV to test the SVM classifiers of methylation and gene expression features. The confusion matrixes of methylation features and gene expression features were listed in Tables 4, 5, respectively. From Table 3, we can see that the AF patients (g1 + g2) and healthy individuals (g3) were perfectly classified using the methylation features, but the methylation data did not have great performance on classifying the subtype of AF (g1 and g2). From Table 4, we can see that the two subtype of AF, paroxysmal atrial fibrillation (g1) and persistent atrial fibrillation (g2), had very different gene expression pattern. In other words, the methylation data and gene expression data complement each other. The methylation data can be used to predict the AF and the gene expression data can be used to classify the subtypes of AF.
FIGURE 2 | The heatmap of the key gene expression features. The 30 samples were from three groups: 10 patients with paroxymal atrial fibrillation (g1), 10 patients with persistent atrial fibrillation (g2), and 10 healthy individuals (g3). It can be seen that most samples were cluster into the right groups.

Predicted g1 Predicted g2 Predicted g3
Actual g1 9 0 1 Actual g2 0 9 1 Actual g3 1 0 9 We checked the wrongly predicted samples in Tables 3, 4. They were different. Within the 30 samples, 22 samples had the same predicted labels by expression and methylation. All these 22 samples were correctly predicted. For the 8 inconsistent samples between expression and methylation predictions, at least one of the two predictions (the expression-based prediction and the methylation-based prediction) was correct. In other words, all the samples can be corrected classified based on either expression or methylation. The expression-based prediction and the methylation-based prediction were complementary.

The Methylation-Expression Regulation Network
We mapped the genes of methylation features and gene expression features to the STRING network (Version 11.0 5 ) (Szklarczyk et al., 2018) and visualized the network using R package igraph (Csardi and Nepusz, 2006) 6 to identify the potential relationship between two candidate genes sets. The methylation-expression regulation network was shown in Figure 3. In the network, the methylation and expression genes were marked in red and green. The methylation genes located in three clusters: EIF3CL-EIF3C, KIF15-TRIM59, and SHOX2-FOXG1-CNTNAP5. The three expression genes (PSMC3, TINAG, and NUDT6) were connected with methylation gene KIF15. Even EPN3 can be indirectly connected to KIF15.
That made KIF15 at the center of the network. These results suggested that KIF5 may play important roles in the pathogenesis of AF. 5 http://string-db.org 6 https://CRAN.R-project.org/package=igraph DISCUSSION DNA methylation, a pre-transcriptional modification characterized by the addition of methyl groups to specific nucleotides, regulates the stability of gene expression states and maintains genome integrity by collaborating with proteins that modify nucleosomes (Ma et al., 2014;Tao et al., 2016;Shen et al., 2017). Previous studies considered that changes in DNA methylation states contribute to the regulation of biological processes underlying AF, such as fibrosis, atrial dilatation, atrial fibroblast proliferation and differentiation from fibroblasts into myofibroblasts (Zhao et al., 2017). To further enhance the biological understanding of the atrial fibrillation, our study focused on DNA methylation, particularly with respect to how it relates to mRNA expression. Among our two gene sets of top 10 genes, we found PDED4, SHOX2, and EMD were the most important genes for AF which have been reported associated with AF in previous reference. Atrial fibrillation is reported to be associated with a profound remodeling of membrane receptors and alterations in cAMP dependent regulation of Ca2 + handling. PDE4 is expressed in human atrial myocytes and accounts for approximately 15% of PDE (phosphodiesterase) activity (Molina et al., 2012). PDE4D encoded protein has 3' ,5'-cyclic-AMP phosphodiesterase activity and degrades cAMP, which acts as a signal transduction molecule in multiple cell types and represents the major PDE4 subtype (Berk et al., 2016). The activity of PDEs decreased with age, and the relative PDED4 activity was lower in patients with permanent atrial fibrillation than in age-matched sinus rhythm controls (Milton et al., 2011). Previous study provided evidence that patients with pAF were found to have a decreased PDE4 activity as compared with patients in sinus rhythm (Yeh et al., 2007).
Short Stature Homeobox 2 (SHOX2) is a member of the homeobox family of genes in which mutations associated with early-onset and familial AF (Hoffmann et al., 2019). SHOX2 is considered as a key regulator of sinus node development of which deficiency could lead to bradycardia in animal models (Vicente-Steijn et al., 2017). Previous study demonstrated SHOX2 was susceptible for SND and AF by screening 98 SND patients and 450 individuals with AF. In the heart development of mouse and zebrafish, they also proved SHOX2 plays an important role, the mutation of SHOX2 could lead to severe bradycardia (Blaschke et al., 2007;Ye et al., 2015).
Emerin (EMD) encodes a serine-rich nuclear membrane protein which located on the cytoplasmic surface of the inner nuclear membrane and related to X-linked Emery-Dreifuss muscular dystrophy (EDMD) (Capanni et al., 2009). Previous study found a nonsense mutation in EMD from two EDMD families which is associated with X-linked recessive inheritance, result in serious cardiac complication, including AF (Sakata et al., 2005). Cardiologic assessment revealed slow atrial fibrillation in a recent case of a 65-year-old male patient with a hemizygous duplication of 5 bases in exon 6 of the EMD, gene on the X chromosome (Kissel et al., 2009;Zhao et al., 2014;Brisset et al., 2019).

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation, to any qualified researcher.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Ethics Committee of Shanghai East Hospital. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this manuscript.

AUTHOR CONTRIBUTIONS
YZ and CG conceived and designed the experiments. BL, XS, and KD performed the experiments. ML, YQ, and SZ analyzed the data. BL, XS, and YZ wrote the manuscript. All authors read and approved the final version of the manuscript.