Functional Prediction of Chronic Kidney Disease Susceptibility Gene PRKAG2 by Comprehensively Bioinformatics Analysis

The genetic predisposition to chronic kidney disease (CKD) has been widely evaluated especially using the genome-wide association studies, which highlighted some novel genetic susceptibility variants in many genes, and estimated glomerular filtration rate to diagnose and stage CKD. Of these variants, rs7805747 in PRKAG2 was identified to be significantly associated with both serum creatinine and CKD with genome wide significance level. Until now, the potential mechanism by which rs7805747 affects CKD risk is still unclear. Here, we performed a functional analysis of rs7805747 variant using multiple bioinformatics software and databases. Using RegulomeDB and HaploReg (version 4.1), rs7805747 was predicated to locate in enhancer histone marks (Liver, Duodenum Mucosa, Fetal Intestine Large, Fetal Intestine Small, and Right Ventricle tissues). Using GWAS analysis in PhenoScanner, we showed that rs7805747 is not only associated with CKD, but also is significantly associated with other diseases or phenotypes. Using metabolite analysis in PhenoScanner, rs7805747 is identified to be significantly associated with not only the serum creatinine, but also with other 16 metabolites. Using eQTL analysis in PhenoScanner, rs7805747 is identified to be significantly associated with gene expression in multiple human tissues and multiple genes including PRKAG2. The gene expression analysis of PRKAG2 using 53 tissues from GTEx RNA-Seq of 8555 samples (570 donors) in GTEx showed that PRKAG2 had the highest median expression in Heart-Atrial Appendage. Using the gene expression profiles in human CKD, we further identified different expression of PRKAG2 gene in CKD cases compared with control samples. In summary, our findings provide new insight into the underlying susceptibility of PRKAG2 gene to CKD.


INTRODUCTION
Chronic kidney disease (CKD) is a major global problem caused by the permanent loss of kidney function, and is also associated with an increased risk for cardiovascular disease (Cusumano and Gonzalez Bedat, 2008;Prodjosudjadi et al., 2009;Chambers et al., 2010;Shinohara, 2010;Sherwood and McCullough, 2016;James et al., 2017;Malhotra et al., 2017;Sinha and Bagga, 2018).
The overall prevalence of CKD exceeds 10%, and is approximately 14% in the general population and its incidence is increasing (Almirall, 2016;Hursitoglu, 2016;Mills and He, 2016;Wuttke and Kottgen, 2016;Hedayati et al., 2017;James et al., 2017;Clark et al., 2018). It is reported that up to 20% of CKD cases are caused by genetic forms of renal disease (Almirall, 2016;Hursitoglu, 2016;Mills and He, 2016;Wuttke and Kottgen, 2016;Hedayati et al., 2017;James et al., 2017;Clark et al., 2018). Understanding genetic predisposition to CKD and uncovering underlying pathophysiological mechanisms may contribute to the development of targeted therapies. In recent years, the genetic predisposition to CKD has been widely evaluated especially using the genome-wide association studies (GWAS), which highlighted some novel genetic susceptibility variants in many genes, and estimated glomerular filtration rate to diagnose and stage CKD (Pattaro et al., 2016;Wuttke and Kottgen, 2016).
In these CKD risk genes, a genetic variant rs7805747 in PRKAG2 was identified to be significantly associated with both serum creatinine and CKD with genome wide significance level (Chambers et al., 2010). The rs7805747 (chr7:151407801 for hg19) variant is located in intronic of PRKAG2. PRKAG2 is a protein coding gene. Until now, the potential mechanism by which rs7805747 affects CKD risk is still unclear. It is difficult to identify the function of coding and non-coding genes in molecular wet laboratories. However, computational methods including kinds of bioinformatics software and databases may be useful tools to guide and predict function (Zou et al., 2016;Wei et al., 2017a,b;He et al., 2018a,b;Jia et al., 2018a,b;Jiang et al., 2018;Zeng et al., 2018). Here, we performed a functional analysis of rs7805747 variant using multiple bioinformatics databases including RegulomeDB (Boyle et al., 2012), HaploReg (version 4.1) (Ward and Kellis, 2016), PhenoScanner (version 1.1) (Staley et al., 2016), and UCSC Genome Browser (Rosenbloom et al., 2015;Tyner et al., 2017;Casper et al., 2018), as did in previous studies (Lu et al., 2011;Rhie et al., 2013;Hazelett et al., 2014;Liu et al., 2016Liu et al., , 2017aLiu et al., ,b,c,d,e, 2018bGuo et al., 2017;Hu et al., 2017a,b;Jiang et al., 2017;Zhang et al., 2018). Meanwhile, we analyzed a whole genome case-control expression profiles in human CKD to investigate whether the susceptibility gene PRKAG2 is differently expressed in CKD cases compared with control samples.

Regulatory Analysis of rs7805747 Using RegulomeDB
RegulomeDB database could annotate genetic variants with known and predicted regulatory elements in the intergenic regions of the human genome (Boyle et al., 2012). In brief, the known and predicted regulatory DNA elements include regions of DNAase hypersensitivity, binding sites of transcription factors, and promoter regions that have been biochemically characterized to regulation transcription (Boyle et al., 2012). These regulatory element datasets are from Gene Expression Omnibus (GEO), the Encyclopedia of DNA Elements (ENCODE) project, and published literature (Boyle et al., 2012).

Functional Analysis of rs7805747 Using HaploReg
HaploReg is a tool for exploring annotations of the non-coding variants Kellis, 2012, 2016). HaploReg v4 included LD information from the 1000 Genomes Project, chromatin state and protein binding annotation from the Roadmap Epigenomics and ENCODE projects, sequence conservation across mammals, the effect of SNPs on regulatory motifs, and the effect of SNPs on expression from eQTL studies Kellis, 2012, 2016). More detailed information is provided in the original studies Kellis, 2012, 2016).

Functional Analysis of rs7805747 Using PhenoScanner
PhenoScanner included publicly available large-scale GWAS summary results, about 3 billion associations and over 10 million unique single nucleotide polymorphisms (SNPs) and a broad range of phenotypes (Staley et al., 2016). The results are aligned across traits to the same effect and non-effect alleles for each SNP (Staley et al., 2016). Here, we performed three kinds of functional analyses including the GWAS, Metabolites, and eQTL analysis options (Staley et al., 2016). To perform a GWAS analysis, the PhenoScanner included 88 GWAS datasets with 76 kinds of diseases or phenotypes (Staley et al., 2016). To perform a Metabolites analysis, PhenoScanner consisted of two metabolomics datasets (Shin et al., 2014;Kettunen et al., 2016). To perform an eQTL analysis, PhenoScanner included several eQTL datasets from eQTL Browser, Geuvadis, GTEx (version 6), MuTHER and bloodeqtlbrowser. More detailed information is provided on the original study (Staley et al., 2016).

Gene Expression Analysis of PRKAG2 in GTEx
We evaluated the expression of PRKAG2 using the RNA-Seq datasets from the NIH Genotype-Tissue Expression (GTEx) project, which was created to establish a sample and data resource for studies on the relationship between genetic variation and gene expression in multiple human tissues (GTEx Consortium, 2013;Mele et al., 2015). The GTEx project included median gene expression levels in 51 tissues and 2 cell lines (V6, October 2015) (GTEx Consortium, 2013;Mele et al., 2015). This release is based on data from 8555 tissue samples obtained from 570 adult postmortem individuals (GTEx Consortium, 2013;Carithers et al., 2015;Mele et al., 2015). Here, we used the Genome Browser to evaluate the expression of PRKAG2 in GTEx 53 human tissues (V6, October 2015). The UCSC Genome Browser is a new method to visualize interactions between regions of the genome (Meyer et al., 2013;Karolchik et al., 2014;Rosenbloom et al., 2015;Speir et al., 2016;Tyner et al., 2017;Casper et al., 2018).

Case-Control Gene Expression Analysis of PRKAG2
We analyzed a whole genome case-control expression profiles in human CKD (Nakagawa et al., 2015). In the original study, a microarray analysis with renal biopsy specimens from CKD patients was conducted to identify the responsible genes associated with tubulointerstitial fibrosis and tubular cell injury in CKD (Nakagawa et al., 2015). This study showed microarray profiles in a total of 61 samples including 53 biopsy specimens of CKD patients, and 8 controls (Nakagawa et al., 2015). Here, we selected the web tool GEO2R to evaluate whether PRKAG2 gene is significantly dysregulated in CKD cases compared with control samples, as did in a recent study (Liu et al., 2018a). The significance level is defined to be P < 0.01.

Regulatory Analysis of rs7805747 Using RegulomeDB
Using RegulomeDB, the predicted score is 5, which suggested that rs7805747 is likely to affect binding (TF binding) or DNase peak. The predicted binding protein is HNF4A (chr7:151407767-151408030 by ChIP-seq in Caco2 cell type) (Verzi et al., 2010). The histone modification analysis showed that rs7805747 was predicated to locate in enhancer histone marks (Liver, Fetal Intestine Large, Right Ventricle, Duodenum Mucosa, and Fetal Intestine Small). Here, we provided some key information about regulatory analysis in Table 1

Functional Analysis of rs7805747 Using PhenoScanner in GWAS
Using PhenoScanner in GWAS option, we identified 43 significant association results with P < 0.01. In addition to the CKD, we found that rs7805747 is also significantly associated with other diseases or phenotypes including Hemoglobin Hb, Hematocrit Hct, Red blood cell count RBC, systolic blood pressure (SBP), Breast cancer, Gout, Hypertension, and Extraversion, as described in Table 2.

Gene Expression Analysis of PRKAG2 in GTEx
Using the UCSC Genome Browser, the results showed that PRKAG2 had the highest median expression: 34.90 RPKM in Heart -Atrial Appendage (Ensembl gene ID: ENSG00000106617.9, Genomic position: hg38 chr7:151556111-151877125). PRKAG2 had the total median expression 412.37 RPKM in all these 53 tissues. Figure 1 provided more detailed

Case-Control Gene Expression Analysis of PRKAG2
There are three probes to evaluate the expression of PRKAG2 gene in the gene expression profiles in human CKD including A_24_P384779, A_23_P44366, and A_23_P314760. Each of these three probes represents different regions of PRKAG2 gene. These  Table 3, there are 15 unique genes regulated by rs7805747. In addition to PRKAG2, we also evaluated the expression of other 14 Beta is the regression coefficient based on A allele, which means that A allele regulates increased (Beta > 0) and reduced (Beta < 0) expression of nearby genes. Beta is the regression coefficient based on A allele, which means that A allele regulates increased (Beta > 0) and reduced (Beta < 0) metabolites.
genes, as provided in Table 3. The results showed that 4 of other 14 genes including TMUB1, AGAP3, XRCC2, and WDR86-AS1 also had different expression in CKD cases with P < 0.01. Table 5 provided the detailed information about 23 probes of 15 genes including PRKAG2. Importantly, the different expression of PRKAG2, TMUB1, AGAP3, and XRCC2 had passed the multiple testing correction threshold 0.01/23 = 0.000435.

DISCUSSION
It is reported that PRKAG2 could encode the gamma2-subunit isoform of 5 -AMP-activated protein kinase (AMPK) (Ahmad et al., 2005;Banerjee et al., 2007;Folmes et al., 2009;Kim et al., 2012;Thorn et al., 2013;Zhang et al., 2013;Hinson et al., 2016Hinson et al., , 2017. AMPK is a metabolic enzyme, which plays important roles in regulating of energy metabolism in response to cellular stress. AMPK has been identified to be a regulator of metabolism, survival, and fibrosis, by a recent integrative analysis of PRKAG2 cardiomyopathy iPS and microtissue models (Hinson et al., 2016). In addition, mutations in PRKAG2 have been identified to be associated with hypertrophic cardiomyopathy . Over the past decade, GWAS have considerably improved our understanding of the genetic basis of kidney function and disease (Wuttke and Kottgen, 2016). A SNP rs7805747 identified by CKD GWAS lies upstream of PRKAG2. Here, we performed a comprehensively functional analysis of this variant using multiple bioinformatics databases including RegulomeDB (Boyle et al., 2012), HaploReg (version 4.1) (Ward and Kellis, 2016), and PhenoScanner (version 1.1) (Staley et al., 2016). Using RegulomeDB, rs7805747 is predicted to affect HNF4A binding or DNase peak. Using RegulomeDB, the predicted score is 5, which suggested that rs7805747 is likely to affect binding (TF binding) or DNase peak. The predicted binding protein is HNF4A (chr7:151407767-151408030 by ChIP-seq in Caco2 cell type). In addition, rs7805747 was predicated to locate in enhancer histone marks (Liver, Fetal Intestine Large, Right Ventricle, Duodenum Mucosa, and Fetal Intestine Small). Using HaploReg (version 4.1), we identified rs7805747 to be associated with enhancer histone marks, DNase hypersensitivity, and motifs changed. In HaploReg (version 4.1), rs7805747 was also predicated to locate in enhancer histone marks (Liver, Duodenum Mucosa, Fetal Intestine Large, Fetal Intestine Small, and Right Ventricle tissues). Hence, the findings in HaploReg (version 4.1) were consistent with RegulomeDB.
Using PhenoScanner in GWAS option, we showed that rs7805747 is not only associated with CKD, but also is significantly associated with other diseases or phenotypes including Hemoglobin Hb, Hematocrit Hct, Red blood cell count RBC, SBP, Breast cancer, Gout, Hypertension, and Extraversion.
Using PhenoScanner in eQTL option, rs7805747 is identified to be significantly associated with gene expression in multiple human tissues and multiple genes including PRKAG2. Previous study has reported rs7805747 to be associated with serum creatinine and CKD (Chambers et al., 2010). Using PhenoScanner in metabolites option, rs7805747 is identified to be significantly associated with not only the serum creatinine, but also with other 16 metabolites, as described in Table 4.
The gene expression analysis of PRKAG2 using 53 tissues from GTEx RNA-Seq of 8555 samples (570 donors) in GTEx showed that PRKAG2 had the highest median expression in Heart -Atrial Appendage. Using the gene expression profiles in human CKD, we further identified different expression of PRKAG2 gene in CKD cases compared with control samples. All these findings indicate that rs7805747 is associated with CKD risk, PRKAG2 gene expression, and 17 metabolites. Meanwhile, gene expression analysis further showed that CKD cases had different expression of PRKAG2 gene. In summary, our findings provide new insight into the underlying susceptibility of PRKAG2 gene to CKD.

AUTHOR CONTRIBUTIONS
EW conceived and initiated the project and performed the functional analysis. EW, HZ, DZ, LL, and LD wrote the manuscript. All authors reviewed the manuscript and contributed to the final manuscript.