Incorporating Information of microRNAs into Pathway Analysis in a Genome-Wide Association Study of Bipolar Disorder

MicroRNAs (miRNAs) are known to be important post-transcriptional regulators that are involved in the etiology of complex psychiatric traits. The present study aimed to incorporate miRNAs information into pathway analysis using a genome-wide association dataset to identify relevant biological pathways for bipolar disorder (BPD). We selected psychiatric- and neurological-associated miRNAs (N = 157) from PhenomiR database. The miRNA target genes (miTG) predictions were obtained from microRNA.org. Canonical pathways (N = 4,051) were downloaded from the Molecule Signature Database. We employed a novel weighting scheme for miTGs in pathway analysis using methods of gene set enrichment analysis and sum-statistic. Under four statistical scenarios, 38 significantly enriched pathways (P-value < 0.01 after multiple testing correction) were identified for the risk of developing BPD, including pathways of ion channels associated (e.g., gated channel activity, ion transmembrane transporter activity, and ion channel activity) and nervous related biological processes (e.g., nervous system development, cytoskeleton, and neuroactive ligand receptor interaction). Among them, 19 were identified only when the weighting scheme was applied. Many miRNA-targeted genes were functionally related to ion channels, collagen, and axonal growth and guidance that have been suggested to be associated with BPD previously. Some of these genes are linked to the regulation of miRNA machinery in the literature. Our findings provide support for the potential involvement of miRNAs in the psychopathology of BPD. Further investigations to elucidate the functions and mechanisms of identified candidate pathways are needed.


INTRODUCTION
Bipolar disorder (BPD) is a highly heritable psychiatric illness. The genetic components were estimated to account for as high as ∼80% of phenotypic variability (McGuffin et al., 2003). Although many candidate and genome-wide association (GWA) studies have conducted to investigate the complex nature of pathogenetics in BPD, previously reported genetic findings only account for a small proportion of its heritability (Gershon et al., 2011). The missing heritability may be partially explained by the limited numbers, types, and frequency of susceptible variants that currently genotyped in high-throughput array, and other mechanisms such as gene × gene or gene × environment interactions, as well as the heterogeneity in phenotype definitions across studies (Manolio et al., 2009). Nevertheless, large-scale GWA studies remain to be an efficient and promising study design to uncover the underlying etiology of complex psychiatric disorders (Sullivan and Investigators, 2012), while new theoretical framework and statistical approaches must be taken into consideration.
Recently, pathway-based analysis, which simultaneously tests a group of functionally related genes, has been widely used as an alternative and complementary strategy to bring more insights into the biological mechanisms of disease of interest (Wang et al., 2010). In addition, inclusion of prior information from other aspects, such as gene expression or gene regulation in GWAS analysis offers great opportunities to identify new association findings and to generate novel hypotheses (Tintle et al., 2009a). For instance, two prior GWA studies in type 2 diabetes and osteoporosis applied integrative approaches that used gene expression data and pathway-based analysis to identify novel associated pathways and loci (Hsu et al., 2010;Zhong et al., 2010). In addition to gene expression information, other types of data could also be incorporated into pathway-based analysis using GWA data, such as methylation (Chuang et al., 2012) and microRNAs (miRNAs) patterns, especially disease-associated miRNAs.
The miRNAs are one kind of functional non-coding RNAs acting as post-transcriptional regulators for translation and the stability of mRNAs, which involved in a wide range of biological processes, including regulation of brain and neuronal development (Fiore et al., 2008). The miRNA dysregulation has been reported to play important roles in the etiology of many diseases, including complex psychiatric traits (Xu et al., 2010). Previously, many psychiatric-and neurological-associated miRNAs were identified from expression studies of postmortem brain and animal models (Forero et al., 2010), and from genetic association studies of variants in genes encoding miRNAs and binding site of miRNAs target genes (Muinos-Gimeno et al., 2009; The Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011).
We have known that an individual miRNA could target hundreds of mRNA molecules (Lim et al., 2005), therefore the target genes of a phenotype-related miRNA may potentially associate with the trait. For example, abnormal expression level of brainexpressed miR-132 has been reported to be associated with psychiatric disorders via affecting the expression of brain-derived neurotrophic factor that is involved in dendritic plasticity (Klein et al., 2007;Wayman et al., 2008;Askland et al., 2009;Forero et al., 2010). In addition, different psychiatric disorders (e.g., schizophrenia and BPD) may share some degree of the genetic factors through the involvement of similar biological pathways. Thus, a single miRNA may play roles in the etiology of more than one psychiatric disorder. Given these features of miRNAs, it is our interest to incorporate information of phenotype-related miRNAs and their predicted targets into the GWA analysis, which provide a new avenue for researchers to investigate the underlying genetic components that are associated with BPD.
In the current study, we performed pathway-based analysis using large-scale GWA dataset of BPD in combination with the data source of miRNAs. The psychiatric-and neurologicalassociated miRNAs were identified from PhenomiR database and miRNA target predictions were obtained from microRNA.org database. Our main goal is to better identify genes and important biological pathways to be associated with BPD while incorporating the regulatory information of miRNAs.

Figure 1
shows the data integration flowchart in our study to identify the psychiatric-and neurological-associated miRNAs and their target genes, and the annotated pathways for pathway-based analysis. Details are mentioned below.

GENOME-WIDE ASSOCIATION DATASET
The Genetic Association Information Network (GAIN) GWA dataset was downloaded from dbGaP 1 . We extracted BPD dataset from the GAIN: full details of subject enrollment and genotyping can be obtained in the original article (The GAIN Collaborative Research Group, 2007). The GWA dataset of BPD comprised 1,001 BPD cases and 1,034 controls, which used Affymetrix Genome-Wide Human SNP Array 6.0 platform for SNP (single nucleotide polymorphism) genotyping. After applying quality control filters and excluding SNPs in sexual chromosomes, there were 698,227 autosomal SNPs in the GWA dataset in our analyses.

IDENTIFICATION OF TARGET GENES OF PSYCHIATRIC DISEASE-ASSOCIATED miRNAs
Information of disease-associated miRNAs was downloaded from the PhenomiR database (Ruepp et al., 2010). The PhenomiR 1 http://www.ncbi.nlm.nih.gov/gap collected published miRNA-disease associations via manual curation. It annotates diseases into 22 classes according to the Online Mendelian Inheritance in Man (OMIM) Morbid Map. In our analysis, we selected all miRNAs that are associated with neurological and psychiatric classes as the candidates of disease-associated miRNAs. In total, there were 157 unique miRNAs to form 293 miRNA-disease association pairs.
The miRNA target predictions were obtained from micor-RNA.org database 2 (Betel et al., 2008). The microRNA.org performed miRNA target prediction using miRanda algorithm (John et al., 2004) and scored the likelihood of mRNA downregulation of predicted target sites by using mirSVR algorithm (Betel et al., 2010). The combination of miRanda-mirSVR approach has been shown to effectively identify target predictions to cover a significant number of non-canonical sites, and has competitive ability in predicting expression changes of mRNA or proteins when comparing with other target prediction methods (Betel et al., 2010). In total, 1,097,064 "good mirSVR score-conserved miRNA" target predictions were used in this procedure. Combining these two datasets while follows the criteria of alignment score ≥140, seed site ≥6, free energy ≤−17, and conservation score ≥0.57 (Figure 1), we identified 8,921 genes which were predicted to be the targets of psychiatric-and neurological-associated miRNAs.
We also used another miRNA target prediction algorithm, DIANA-microT, which considers not only strong binding (at least seven consecutive Watson-Crick base pairing nucleotides) but also weak binding ability (only six paired nucleotides or G:U wobble pairs) to predict the miRNA target genes (miTG; Maragkakis et al., 2009a,b). DIANA-microT provides scores for miTG as an indicator for the probability of being a real target site. The calculation of an overall miTG score mainly based on scoring all binding types and conservation profile of all putative miRNA recognition element (MRE) within the 3 UTR using the weighted sum method. Therefore, target genes of psychiatric-and neurological-associated miRNAs with high predictive probability in significant pathways were filtered by the DIANA-microT algorithm. We used a miTG score greater than 19 (a strict threshold) as the selection criterion, which implicates the predicted target was highly reliable being a true miRNA target.

STATISTICAL ANALYSIS
We used PLINK (version 1.07) to conduct single marker association analyses with additive model (Purcell et al., 2007). We first mapped SNPs to genes to obtain gene-level statistic for BPD using the GWA dataset in GAIN. SNPs were mapped to genes if they located within 5 kb of the 5 upstream and 3 downstream of a gene using NCBI human genome build 36. For each gene, the smallest P-value (P min ) among all SNPs within the gene region was used to represent the gene-level statistic. In total, there were 304,343 SNPs assigned into 16,385 genes in the GWA dataset of BPD.
Annotated pathways were obtained from the Molecule Signature Database, MsigDB (Subramanian et al., 2005). MsigDB consists of several online pathway databases, including Kyoto Encyclopedia of Genes and Genomes (KEGG), BioCarta, Reactome, Gene Ontology (GO) terms, and gene sets collected from  published literature. From MsigDB, we obtained 4,726 pathways that cover 22,429 genes. Pathways containing less than 10 genes or more than 380 genes were excluded to avoid bias due to extreme small or large pathway size. Thus, there were 4,051 canonical pathways in the pathway-based analyses using the GWA dataset of BPD in the present study. Pathway-based analyses were conducted using both competitive and self-contained approaches (Wang et al., 2010) to capture a broader range of important pathways. The gene set enrichment analysis (GSEA), as a competitive method, first ranks P min values of all genes from the smallest to the largest. Then, for a given pathway, an enrichment score (ES) was calculated based on gene-wise statistic values (t j ), which were defined as the χ 2 statistic of the corresponding most significant SNP to evaluate association signals (Wang et al., 2007). The sum-statistic (SUM) approach, as a selfcontained test, sums up gene-wise statistic values (t i ) over the set of genes (Σ S i=1 t i ) in a specific pathway (Tintle et al., 2009b). The details of calculation procedures were provided in our previous study .

WEIGHTING PROCEDURE
We employed a weighting scheme for genes that were predicted to be psychiatric-and neurological-associated miRNA targets in every annotated pathway. First, we calculated the overall proportion of SNPs with P-value <0.05 in the whole GAIN dataset. Second, in a given gene, the proportion of significant SNPs was calculated and then compared with the proportion of significant SNPs in the whole GWA dataset to evaluate whether this gene was informative. The detail of weighting procedures was described as below. For each pathway, n and m represent the number of miTGs and non-miTGs, respectively. K n and K m were the number of informative genes in the n miTGs and m non-miTGs, respectively. Therefore, the proportion of informative genes in the miTGs and non-miTGs were K n /n and K m /m. The Harmonic average (H), defined as 1/[(1/mK n ) + (1/nK m )], was used as the basis of our weighting scheme for calculating the gene-wise weights of the miRNA and non-miTGs in a pathway and to minimize the potential bias in the pathway analysis due to the variation of pathway size. If K n /n was greater than K m /m, the weights for miTGs and non-miTGs were assigned mk n /H and nk m /H, respectively. If no informative genes exist in non-miRNA genes, a weight one was assigned to non-miRNA genes; while a weight, ranging from one to six, according to the proportion of informative genes (using 0.1, 0.3, 0.5, 0.7, and 0.9 as cut-off values), was assigned to miRNA genes. Otherwise, equal weights were used for the two sets of genes.
For each pathway, weights were assigned to miTGs and non-miTGs. Then all genes were classified into S set and NS set according to their involvement in a pathway or not. When using competitive methods, genes within a pathway were compared with genes not within the pathway. Regarding to self-contained methods, only genes within the pathway were considered. A total of 5,000 permutations were performed to evaluate the empirical significance level of each pathway. To account for multiple testing issues in the analyses, algorithm proposed by Benjamini and Hochberg (1995) was used to control for false discovery rate (FDR).

RESULTS
A total of 4,051 pathways were constructed and tested for associations with the risk of BPD using the GWA dataset of BPD. With the inclusion of 157 psychiatric-and neurological-associated miRNAs as the prior information into pathway-based analyses, we identified many enriched pathways for BPD. Under four testing scenarios, including weighted and non-weighted GSEA and SUM statistics, there were more than 100 significant pathways associated www.frontiersin.org with BPD at the level of empirical P-value < 0.01, and the number was reduced to more than 40 after FDR correction ( Table A1 in Appendix). Comparing with non-weighting scenario, pathway analysis under the weighting scenario identified additional 20 and 223 significant pathways (FDR < 0.01) by using GSEA and SUM methods, respectively ( Table A2 in Appendix). Under the non-weighted scheme, the number of enriched pathways identified by both the GSEA and SUM methods with FDR < 0.01 was 43, while the number was 38 under the weighted scheme. The union set of these enriched pathways were in total 62 pathways ( Table A3 in Appendix), including 18 annotated GO, 7 KEGG, and 37 curated gene sets. Among these pathways, 19 significant pathways were identified by both the GSEA and SUM methods using both non-weighted and weighted scheme. Table 1 showed 19 enriched pathways with stringent criterion of FDR < 0.01, including four GO gene sets and 15 curated gene sets, which exhibited strong associations with BPD under all four statistical scenarios. The three significant GO gene sets, cation transmembrane transporter activity, gated channel activity, and ion transmembrane transporter activity, were ion channel/transporter related. The fourth GO gene set was nervous system development, which was reported to be associated with BPD previously. After performing our weighting scheme, 19 additional pathways were identified at the significance level of FDR < 0.01 (Table 2), including six annotated GO, 3 KEGG, and 10 curated gene sets. Many of them are novel findings for BPD, such as cytoskeleton, retinol metabolism, drug metabolism other enzymes, etc. In total, we found 38 significant enriched pathways for BPD.

miRNA TARGET PREDICTION
The results of miRNA target predictions in 38 enriched pathways were further examined. Initially, miRNA target prediction was performed by using miRanda-mirSVR approach with previously described filtering criteria (Figure 1). There were 2,438 unique genes in 38 significantly enriched pathways. Among them, 546 had P min -value less than 0.01. Table 3 summarized the results of miRNA target predictions for these 546 genes. On average, 34.4% of the predicted miTGs had P min -value less than 0.01 in enriched pathways, indicating a higher probability of showing associations with BPD. As expected, the larger the numbers of genes or the numbers of miTGs in a pathway, the higher number of miRanda-mirSVR prediction was observed (correlation coefficient = 0.74 and 0.82, respectively) in the 38 enriched pathways. We then applied the second prediction algorithm, DIANA-microT, to increase the stringency of target genes prediction. These results were also shown in Table 3. By using the strict threshold at miTG score 19, we filtered out the predictions with less probability of correct prediction. Among the 38 pathways, as high as 88.9% of the miRanda-mirSVR predictions could be also predicted by DIANA-microT (ranged from 36.8 to 88.9%). The numbers of miRNA target predictions were also reduced (from 0 to 28). In total, there were 469 miRNA target predictions with miTG score >19 for genes with P min -value of target gene <0.01, which consisted of 113 unique genes and 45 miRNAs. Among these predictions, 22 miTGs were involved in more than three enriched pathways. The 22 miTGs and their corresponding associated miRNAs are displayed in Table 4. The functions of these genes are mainly related to potassium and calcium ion channels (e.g., KCNMA1, KCNQ5, KCNK2, PKD2, and RYR3), collagen (e.g., COL1A2, COL27A1, and COL5A1), and axon guidance (e.g., NF1B, NAV3, and PTPRD).

DISCUSSION
Analyzing GWA dataset with pathway-based approach utilizes information of multiple loci with similar physiological functions to bring biological insights into the mechanisms of BPD (Torkamani et al., 2008;Askland et al., 2009;Holmans et al., 2009;Peng et al., 2010). Integrating other data sources into the analysis framework further offers more opportunities in identifying diseaseassociated loci (Wang et al., 2010). The current study especially focuses on information obtained from miRNAs, which are essential in the regulation processes of brain and neuronal development. We performed pathway-based analyses using a GWA dataset of BPD while incorporating the disease-associated miRNA information into analysis. Many important pathways were identified through our analysis framework.
First, four enriched GO terms were identified for BPD, including cation transmembrane transporter activity, gated channel activity, ion transmembrane transporter activity, and nervous system development. Three of them are ion channel/transporter related. Adding the weighting scheme by miRNA information, we further identified two channel-related pathways ( Table 2), ion channel activity and substrate specific channel activity. The involvement of ion channels in the etiology of BPD was also implicated in other studies for BPD (Askland et al., 2009). We have known that ion channels and transporters are essential components in regulating neuronal excitability. Abnormality of ion channels has been suggested to be a plausible mechanism underlying BPD. To explain the recurrence and cycling nature of mood episodes in BPD, a kindling model was proposed as these clinical conditions are the consequences of neuronal hyperexcitability, which is linked to abnormal functions of ion channels (Mazza et al., 2007;Blumenfeld et al., 2009). Similarly, results in recent GWA and gene expression studies also support the involvement of ion channel genes in the etiology of BPD (Sklar et al., 2008(Sklar et al., , 2011Smolin et al., 2012). Thus, genes in the ion channels or their regulatory loci have been attractive candidates in studying the underlying mechanism for BPD. Our miTGs prediction grants further support for this line of evidence.
In our identified 38 significantly enriched pathways, many miTGs were functionally related to ion channels, especially for potassium (e.g., KCNMA1, KCNQ5, and KCNK2) and calcium (e.g., RYR3 and PKD2) channels. All these genes involved in multiple significant pathways. For instance, KCNMA1 is a calciumactivated potassium channel and KCNQ5 belongs to the voltagegated delayed rectifier potassium channel gene family. Both of them play important roles in the regulation of neuronal excitability (Laumonnier et al., 2006;Brown and Passmore, 2009). Molecular and functional studies found that defects of KCNMA1 contribute to autism and mental retardation (Laumonnier et al., 2006). For KCNK2 gene that encodes for a two-pore-domain background potassium channel, a recent genetic study revealed its association with susceptibility of major depressive disorder and response to antidepressant treatment (Liou et al., 2009). Additionally, several susceptible genes that cause abnormality in calcium signaling, Frontiers in Genetics | Non-Coding RNA  Boylan multiple myeloma C D DN  such as CACNA1C and Bcl-2, were reported to be associated with BPD (Sklar et al., 2008(Sklar et al., , 2011Distelhorst and Bootman, 2011). RYR3, a brain-specific ryanodine receptor for controlling intracellular calcium concentration, was found to be a susceptible gene for schizophrenia (Leonard and Freedman, 2006). In addition, the RYR3 knockout mice exhibited some abnormal behaviors, www.frontiersin.org Frontiers in Genetics | Non-Coding RNA including hyperlocomoter activity and decreased social interaction (Matsuo et al., 2009). Although how these behaviors defects and functional defects of ion channels link with the pathology of BPD are still unclear, it warrants to conduct further basic and functional research to investigate the roles of ion channels in BPD. Second, applying the weighting scheme for psychiatric-and neurological-associated miRNAs in the analysis, we identified several significant pathways for BPD that were involved many nervous related biological processes in Table 2 (Li et al., 2005;Nakatani et al., 2006;Ryan et al., 2006;Bremner and McCaffery, 2008;Ramocki and Zoghbi, 2008;Torkamani et al., 2008;Askland et al., 2009). Some pathways are novel findings for BPD but show their biological plausibility to neurological disorders in general, such as retinol metabolism (Maden, 2002), while other pathways are novel findings specific to BPD (not reported in other neurological disorders, such as drug metabolism other enzymes). Of note, it is known that necessity of cytoskeletal modulation play a role in the processes of nervous system development (Ramocki and Zoghbi, 2008). Additionally, neuroactive ligand receptor interaction pathway was reported to be associated with substance addiction, which is commonly observed comorbid condition in BPD patients (Li et al., 2005).
The miRNA regulation potentially contributes to the functions of these associated pathways in BPD. Recent findings exhibited the essential roles of miRNA machinery in many aspects of nervous system, including Dicer and miR-124 in neuronal development and miR-134 in synaptic development (Gao, 2008;Saba and Schratt, 2010). Regulation of calcium channel gene expression by miR-103 was also reported (Favereaux et al., 2011). In addition to enriched GO and KEGG pathways, we also identified many significant pathways that were obtained from curated data in the literature in MsigDB, which was mainly based on gene expression studies related with multiple cancers. Examining the functions of miRNA-associated genes in these enriched pathways suggested the involvement of different miRNA regulation in the etiology of BPD, including Notch signaling (e.g., JAG1), axonal growth, and guidance (e.g., CNTN4, NFIB, NAV3, and PTPRD), and cholesterol homeostasis (e.g., ABCA1) (Hekimi and Kershaw, 1993;Bixby, 2000;Karasinska et al., 2009;Mason et al., 2009;Shimoda and Watanbbe, 2009;Pedroso et al., 2012). In addition, our identified BPD-associated pathways consisted of many collagen related genes, such as COL1A2, COL27A1, and COL5A1, implicating that these genes may augment their impacts on BPD through miRNA regulation. Although the connection of collagen and BPD was rarely reported in the literature, the emerging data and evidence, came from in vivo studies suggested that collagen joined the processes of axonal growth and guidance, synaptogenesis, and Schwann cell myelination during the development of nervous system (Hubert et al., 2009).
Among these identified miTGs (in Table 4), some of them have been previously linked to the regulation of miRNA machinery.
A recent study showed the mediation of miR-34a and miR-21 on expression of JAG1, to regulate the differentiation of human monocyte-derived dendritic cell, which is involved in five of our identified curated gene sets (Hashimi et al., 2009). In addition, increased expression of NAV3 mRNA was observed in brain tissue of Alzheimer's disease and was suggested to be regulated by miR-29a (Shioya et al., 2010). On the contrary, the links between miR-NAs and BPD were rarely constructed. Most of psychiatric-and neurological-associated miRNAs used in this study were reported to be related to schizophrenia, Alzheimer disease, autism, and Parkinson disease (according to PhenomiR database). Future studies are also needed to uncover the impacts of these miRNAs on the etiology of BPD.
There are some limitations in the present study. First, P minvalue was used to represent the significance level of a gene. The information of other SNPs in a gene region may be missed. Nevertheless, previous studies showed that using P min -value in pathway analysis provides consistent results with other measures of gene-level statistic (Torkamani et al., 2008;Baranzini et al., 2009). Second, despite using more comprehensive pathway (e.g., MsigDB) and miRNA-disease/phenotype (e.g., PhenomiR) databases, the incompleteness of annotated pathways and miRNAdisease/phenotype information could have impacts on the correctness of the identified BPD-associated pathways. Third, the target genes of psychiatric-and neurological-associated miRNAs were predicted by computational methods. Although we used two miRNA target prediction algorithms to increase the correctness of prediction, further experimental validation by using functional studies are still needed in the future.
In conclusion, with integrating currently known psychiatricand neurological-associated miRNAs as prior information, our pathway-based analyses using the GWA dataset of BPD identified not only previously reported pathways, but also new pathways that showed intriguing biological plausibility for BPD. So far, miR-NAs studies in BPD are still in the infant stage. Our findings provided further evidence and support for exploring the roles of miRNA regulation in relation to nervous system for the risk of developing bipolar illness, especially ion channel regulation and axonal development. More, investigations remain to be done to elucidate the functions of these candidate pathways and genes, and the potential mechanisms involved with miRNA-mediated regulation.