# FROM GWAS HITS TO TREATMENT TARGETS

EDITED BY : Jeanette Erdmann and Tanja Zeller PUBLISHED IN : Frontiers in Cardiovascular Medicine

### Frontiers Copyright Statement

© Copyright 2007-2019 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use. ISSN 1664-8714 ISBN 978-2-88945-982-7 DOI 10.3389/978-2-88945-982-7

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# FROM GWAS HITS TO TREATMENT TARGETS

Topic Editors:

Jeanette Erdmann, Universität zu Lübeck, Deutsches Zentrum für Herz-Kreislaufforschung, Germany Tanja Zeller, Universität Hamburg, Deutsches Zentrum für Herz-Kreislaufforschung, Germany

Genome-wide association (GWA) studies, as a prototype of large-scale OMICs studies, have advanced our understanding of the genetic basis of many common diseases. With respect to coronary artery disease (CAD) and cardiovascular risk factors, like lipids, blood pressure or BMI, they have identified hundreds of chromosomal loci that modulate disease risk. Despite their scientific success, GWA studies have been criticized for having failed so far in delivering diagnostically or therapeutically relevant products. However, the ability to achieve such goals has been strengthened recently by further layers of OMICs-based data, including large-scale transcriptomics data, and better annotation of regulatory sequences and epigenetic changes in the genome (e.g. through the ENCODE project), as well as novel tools for bioinformatics analysis, allowing a systems medicine based approach to be applied. All in all, the last decade with its "gold rush of genomic discovery" led to the identification of known and novel pathways involved in the pathogenesis of cardiovascular diseases and point to novel treatment targets. This Research Topic has gathered contributions from scientists working in the field of cardiovascular genetics who have common interests in understanding the pathomechanisms linking genetic association findings and disease to finally translate the findings from large-scale genetic studies into novel treatment options.

Citation: Erdmann, J., Zeller, T., eds. (2019). From GWAS Hits to Treatment Targets. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-982-7

# Table of Contents

*04 Evaluation of 71 Coronary Artery Disease Risk Variants in a Multiethnic Cohort*

Wangjing Ke, Kristin A. Rand, David V. Conti, Veronica W. Setiawan, Daniel O. Stram, Lynne Wilkens, Loic Le Marchand, Themistocles L. Assimes and Christopher A. Haiman

*14 Integrative Bioinformatics Approaches for Identification of Drug Targets in Hypertension*

Daiane Hemerich, Jessica van Setten, Vinicius Tragante and Folkert W. Asselbergs


Le Shu, Montgomery Blencowe and Xia Yang


Adam W. Turner, Doris Wong, Caitlin N. Dreisbach and Clint L. Miller

*78 Integrating Genes Affecting Coronary Artery Disease in Functional Networks by Multi-OMICs Approach*

Baiba Vilne and Heribert Schunkert

*92 Serum Biomarkers of Endothelial Dysfunction in Fabry Associated Cardiomyopathy*

Jefferson Loso, Natalie Lund, Maxim Avanesov, Nicole Muschol, Susanne Lezius, Kathrin Cordts, Edzard Schwedhelm and Monica Patten

*101 Exploring Coronary Artery Disease GWAs Targets With Functional Links to Immunometabolism*

Maria F. Hughes, Yvonne M. Lenighan, Catherine Godson and Helen M. Roche


# Evaluation of 71 Coronary Artery Disease Risk Variants in a Multiethnic Cohort

*Wangjing Ke 1, Kristin A. Rand 2, David V. Conti 1, Veronica W. Setiawan 1, Daniel O. Stram 1, Lynne Wilkens 3, Loic Le Marchand 3, Themistocles L. Assimes 4,5 and Christopher A. Haiman 1\**

*1 Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, CA, United States, 2 Ancestry, San Francisco, CA , United States, 3 Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, United States, 4 Department of Medicine, Stanford University School of Medicine, Stanford, CA, United States, 5 Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA, United States*

Background: Coronary heart disease (CHD) is the most common cause of death worldwide. Previous studies have identified numerous common CHD susceptibility loci, with the vast majority identified in populations of European ancestry. How well these findings transfer to other racial/ethnic populations remains unclear.

### *Edited by:*

*Mete Civelek, University of Virginia, United States*

### *Reviewed by:*

*Thorsten Kessler, Deutsches Herzzentrum München, Germany Casey E Romanoski, University of Arizona, United States Ingrid Braenne, University of Virginia, United States*

*\*Correspondence:*

*Christopher A. Haiman Christopher.Haiman@med.usc.edu*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 06 November 2017 Accepted: 21 February 2018 Published: 14 March 2018*

### *Citation:*

*Ke W, Rand KA, Conti DV, Setiawan VW, Stram DO, Wilkens L, Le Marchand L, Assimes TL and Haiman CA (2018) Evaluation of 71 Coronary Artery Disease Risk Variants in a Multiethnic Cohort. Front. Cardiovasc. Med. 5:19. doi: 10.3389/fcvm.2018.00019*

Methods and Results: We examined the generalizability of the associations with 71 known CHD loci in African American, Latino and Japanese men and women in the Multiethnic Cohort (6,035 cases and 11,251 controls). In the combined multiethnic sample, 78% of the loci demonstrated odds ratios that were directionally consistent with those previously reported (*p* = 2 × 10−6), with this fraction ranging from 59% in Japanese to 70% in Latinos. The number of nominally significant associations across all susceptibility regions ranged from only 1 in Japanese to 11 in African Americans with the most statistically significant association observed through locus fine-mapping noted for rs3832016 (OR = 1.16, *p* = 2.5×10−5) in the *SORT1* region on chromosome *1p13*. Lastly, we examined the cumulative predictive effect of CHD SNPs across populations with improved power by creating genetic risk scores (GRSs) that summarize an individual's aggregated exposure to risk variants. We found the GRSs to be significantly associated with risk in African Americans (OR = 1.03 per allele; *p* = 4.1×10−5) and Latinos (OR = 1.03; *p* = 2.2 × 10−8), but not in Japanese (OR = 1.01; *p* = 0.11).

Conclusions: While a sizable fraction of the known CHD loci appear to generalize in these populations, larger fine-mapping studies will be needed to localize the functional alleles and better define their contribution to CHD risk in these populations.

Keywords: coronary heart disease, genome wide association study (GWAS), multi-ethnic, African Americans, Latino American, Japanese Americans, *SORT1*

# Introduction

Coronary heart disease (CHD) is the most common, chronic, life-threatening illness in the United States, affecting more than 11 million people (1). A study with twins has estimated the genetic contribution to the variation in CHD mortality to be 0.57 and 0.38 in males and females, respectively (2). Genome-wide association studies (GWAS) have been conducted primarily in populations of European ancestry and have identified ~65 regions associated with CHD risk (3–11). Many of the CHD loci were identified in a large study of 22,233 case and 64,762 control of European ancestry in the CARDIoGRAMplusC4D consortium, which reported 46 genome-wide significant variants with odds ratios ranging from 1.01 to 2.08 and effect allele frequencies of 0.06–0.91 (9). More recently, 10 additional loci were reported from the same consortium in a genome-wide association study involving 61,289 cases and 126,310 controls subjects following imputation to the 1,000 Genomes Project reference panel (12). Genome-wide scans have also revealed 7 CHD risk loci in Asian populations (13–17). The known genetic risk variants for CHD are estimated to explain only 10–11% of the heritability of CHD (9, 12), suggesting that many additional genetic susceptibility loci remain to be discovered.

Several studies in Asian populations have reported successful replication of known CHD regions (17–20), with a reproducible disease association consistently noted with the *9p21* region. A limited number studies have been performed to investigate risk associated with CHD variants in minority groups such as African Americans or Latinos (21–28). In 2011, a GWAS in African Americans found a SNP, rs1859023, located at *7q21* near the *PFTK1* gene to be significantly associated with CHD (22), however this finding has never been replicated in African Americans or any other racial/ethnic group. In a study of 8,090 African Americans (~700 CHD cases) that examined known CHD risk regions, only *9p21* was found to be associated with CHD (25). In a study of 8,201 African Americans (~550 CHD cases) (26), investigators found consistent direction of effects compared to studies of European ancestry for 23 of 44 (binomial *p* = 0.52) known loci with two nominally statistically significant (rs599839 at *1p13/SORT1* and rs579459 at *4p23/ABO*). Genetic studies of CHD in Latino populations have been extremely limited. In a Costa Rican study that examined only 14 CHD SNPs in 1,898 cases with MI and 2,096 controls, 7 variants at 3 regions (*SORT1*, *CXCL12*, and *9p21*) were found to be significantly associated with risk (29). Thus, additional studies are needed to understand the generalizability and relevance of the known CHD risk loci in populations of non-European ancestry.

In this context, the objective of this study was threefold. First, we wished to determine whether associations involving 71 known susceptibility variants of CHD from 65 independent regions generalize across African-American, Latino and Japanese men and women in the Multiethnic Cohort, a study that includes over 6,000 cases and 11,000 controls. Second, we evaluated common genetic variation across each susceptibility region in an attempt to identify variation that might better define the risk associations compared to the index variants in the multiethnic sample. Lastly, we constructed genetic risk scores (GRS) summarizing one's degree of exposure to high risk alleles of CHD and evaluated to what degree this GRS contributes to population differences in CHD risk.

### Methods

### Study Population

The Multiethnic Cohort study (MEC) is a large prospective cohort study that was established between 1993 and 1996. The MEC includes primarily African Americans, Japanese American, Native Hawaiians, Latinos and European Americans living in Hawaii and California. Cohort members were recruited through Department of Motor Vehicle license files and supplemented by voter registration and Health Care Financing Administration (Medicare) files. Participating individuals were between 45 and 75 years of age, and completed a 26-page self-administered, detailed questionnaire at cohort entry (baseline data, 1993–1996). The questionnaire included basic demographic factors (including race/ ethnicity and education), lifestyle factors (e.g., diet, medication use and smoking history), and chronic medical conditions. Follow-up questionnaires were also administered in years 1999 and 2003 which contained updates on participant's CHD status and lifestyle factors.

Several nested case-control studies have been assembled in the MEC for GWAS of a number of cancer and non-cancer traits (30– 32) including breast cancer, prostate cancer, and type-2 diabetes, mainly in populations of non-European ancestry. In the current study, we identified CHD cases and non-cases within these nested studies for the genetic analysis of CHD risk SNPs.

The MEC study obtained written informed consent from study participants for genetic analysis, approval from the Health Science Review Board (HSIRB) at the University of Southern California, and IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS). Genetic data for the MEC is available on dbGAP (phs000517. v3.p1, phs000851.v1.p1, phs000356.v2.p1, phs000306.v4.p1, phs000683.v1.p1)

### CHD Case/Control Definitions

CHD cases were identified through linkage of the MEC to the California Hospital Discharge Data (1990–2012) (CHDD) and the Centers for Medicare and Medicaid Services (CMS) claim files (MedPAR, outpatient) (1999–2011). Hospital discharge information was not available for the subjects from Hawaii which included 76.6% of the Japanese men and women. A CHD case was defined as having ischemic heart disease under ICD-9 codes (DX 410–414), by the principal or first diagnosis code and the principal or first procedure code. We also included cases with a primary cause of death due to myocardial infarction (ICD-9 DX410, ICD-10 I21), or other CHD conditions (ICD-9 DX411–414, ICD-10 I20, I22–25). Both prevalent and incident CHD cases were included in this study. Of the 6,035 CHD cases identified, 1,146 were identified from their baseline questionnaires at the time of enrollment in the MEC study, and a majority of these prevalent cases (1,122, 97.9%) were also identified from CHDD or Medicare.

Controls in this study were subjects with no history of heart attack or angina based on the baseline questionnaire or all subsequent follow-up questionnaires. Those taking nitrates at blood draw in subsequent examinations were also excluded. Individuals with nonprimary CHD diagnosis codes (i.e., 2–24) from the CHDD and Medicare data were excluded from being either a case or control. A total of 11,251 controls were selected, of which 8,307 had at least one previous Medicare or CHDD claim (and thus would have

been identified as a case). A sensitivity analysis using controls with definite claim information was performed.

### Genotyping and Quality Control

We utilized genetic data generated from case-control studies in the MEC of breast cancer, prostate cancer, and type 2 diabetes in African Americans (2,976 males and 3,539 females), Japanese Americans (2,530 males and 2,132 females), and Latinos (3,340 males and 2,769 females). Genotyping was conducted using the Illumina platform with different arrays, including the Human 1M-Duo v3.0 BeadChip (31, 32), HumanOmni2.5-Quad BeadChip (33), Human 660W-Quad BeadChip (34), and the Cardio-MetaboChip (35) (**Table S1**). We used the following exclusion criteria to remove samples whose genetic or phenotypic data were questionable: (1) unknown replicates across studies, (2) call rates < 95%, (3) samples with mismatched gender, such as male samples with >10% mean heterozygosity of SNPs on the X chromosome and/or <10% mean intensity of the Y chromosome; or female samples with <15% mean heterozygosity of SNPs on the X chromosome and/or similar mean allele intensities of SNPs on the X and Y chromosomes, (4) ancestry outliers (>4 standard deviations from the mean of the first or second principal component), and (5) first degree relatives.

A subset of 2,717 African Americans (879 CHD cases and 1,838 controls) and 1,184 Japanese Americans (302 CHD cases and 882 controls) genotyped with the Cardio-MetaboChip were missing data for 20 of the 71 SNPs; these subjects were excluded from the risk score analysis.

### SNP Imputation and Principal Components Analysis

All samples except for the African-American and Japanese samples genotyped with the Metabochip were imputed using the software IMPUTE2, based on build 37 (hg19) coordinates, to the 1000 Genome Project data phase 1 v3. Principal components were calculated by study in smartpca from EIGENSOFT (36) using a random selection of 10,000 SNPs across the genome (MAF >5% and call rate >95%).

### Statistical Analysis

The log-additive effect of each SNP on CHD risk was estimated in PLINK using unconditional logistic regression adjusted for age, sex, BMI and the first 10 principal components to account for potential population stratification (37). All analyses were stratified by ethnicity, disease status (i.e., breast cancer, prostate cancer, or type 2 diabetes disease status). METAL was used to combine the results within and across populations, which included 18 case-control strata in the overall meta-analysis of all populations. For SNPs that were imputed, all were imputed with an IMPUTE2 INFO score >0.8 in each study and population. SNPs rs11752643 and rs3782886 in African Americans, rs180803 in Latinos, and rs6544713, rs4252120, rs2023938, rs3918226, rs3184504, and rs9982601 in Japanese had a minor allele frequency less than 1% and were not included in the ethnicspecific analysis. The cross-ethnic meta-analysis was performed on SNPs observed in at least two ethnic groups.

In addition to testing of the index SNP, we examined regional replication of the signal through testing SNPs in linkage disequilibrium (LD) with the index SNP in European ancestry groups (r<sup>2</sup> ≥0.4 in EUR 1000 Genomes Project). Haploview (38) was used to assess pairwise tag SNPs among bins of markers in the AFR population [tagging r<sup>2</sup> ≥0.8 for SNPs with a MAF >1% based on 1000 Genomes Project data (39)]. For each region, an alpha threshold of significance was set at 0.05 divided by the number of tag SNPs in AFR. We considered evidence of replication to be present in a region when one or more SNPs in LD with the index SNP had a *p*-value that was lower than the region-defined alpha threshold. For imputed SNPs, only those imputed with high quality (IMPUTE2 INFO score >0.8) were included in the regional replication testing. The regional association plots were generated with the LocusZoom program (40).

We also examined the aggregate effect of the CHD risk loci. Three genetic risk scores (GRS) were calculated for each individual: (1) An unweighted GRS comprised of risk-associated alleles from the 71 CHD SNPs, (2) a modified unweighted GRS (I) that substitutes the index SNP with the lead SNP reaching region-wide significance within a specific race/ethnic group of each known CHD locus, and (3) a modified unweighted GRS (II) similar to I but substituting index SNPs with the leading SNPs in each region from our cross-ethnic meta-analysis. The risk alleles for the substitution SNPs were determined based on their observed effects in our study. As outlined above, subjects genotyped with the non-GWAS Metabochip were excluded from the risk score analysis because of missing data on 20 SNPs. The risk score distributions across ethnic groups were compared using a two-sided *t*-test. The association of genetic risk scores with CHD were evaluated within each ethnicity in a logistic regression model adjusted for age, sex, BMI, and the first 10 principal components. Of the 71 SNPs selected, only one pair (rs16986953 and rs2123536) from *TTC32-WDR35* was correlated. Since the association between rs2123536 and CHD was only observed in a Chinese population (16), both SNPs were kept in the GRS analysis.

Within each population, statistical power for each SNP was calculated in the R package "gap," (41) using the allele frequency in each racial/ethnic group, and the documented OR from the literature. The allele frequency for the multiethnic sample was weighted by the sample size of each ethnic group. The power for detecting rare and common alleles within each ethnic group was calculated using QUANTO (42).

### Results

Descriptive characteristics of the 6,035 CHD cases and 11,251 controls stratified by sex and race/ethnicity are presented in **Table 1**. We analyzed a total of 2,376 African-American cases and

### Table 1 | Descriptive Characteristics of CHD Cases and Controls


*\*Numbers don't total to 100% due to missing data.*

4,139 controls, 2,291 Latino cases and 3,818 controls, and 1,368 Japanese cases and 2,294 controls. In general, compared to controls, CHD cases were slightly older at cohort entrance, were heavier in all three ethnic groups and were more likely to have ever smoked than controls in all three ethnic groups (**Table 1**). The associations of BMI and smoking with CHD were similar when further stratified by prevalent conditions, including prostate cancer, breast cancer, and diabetes (**Table S2**).

We had *a priori* greater than 80% power to detect reported per allele effect sizes for 6 out of 71 SNPs in African Americans, 9 out of 71 SNPs in Latinos, and 9 out of 71 SNPs in Japanese Americans and 16 out of 71 SNPs when combining samples from all three ethnic groups (**Figure S1**). Given the sample size in each ethnic group, we had 28.5% power to detect an OR of 1.12 (mean OR from the selected index SNPs) for a rare (MAF = 0.05) allele and 71.6% power to detect the same OR for a common (MAF = 0.20) allele in African Americans; we had 27.3% power to detect OR of 1.12 for a rare allele and 69.3% power for a common allele in Latinos; and 20% power for a rare allele and 52.7% power for a common allele in Japanese Americans.

We examined evidence of replication for 71 CHD variants from 65 regions (**Table S3**). Among these variants, 69 in African Americans, 70 in Latinos, and 65 in Japanese Americans had a MAF >1% and were included in the analysis. Compared to the null expectation that one-half of the examined SNPs show consistent direction of effects as previously reported, 65.2% (45 of 69, binomial *p* = 0.008) SNPs in African Americans, 70.0% (49 of 70, binomial *p* = 5.5 × 10−4) in Latinos, 58.5% (38 of 65, binomial *p* = 0.11) in Japanese, and 77.5% (55 of 71, binomial *p* = 2.0 × 10−6) in the combined multiethnic sample had the same direction of association as previously reported. In African Americans, nominally statistically significant associations (*p* < 0.05) and consistent directional effects were observed for 11 index SNPs in *PPAP2B*, *SORT1*, *IL6R*, *REST-NOA1*, *BTN2A1*, *SLC22A3-LPAL2-LPA*, *9p21*, *CXCL12*, *SH2B3*, and *KCNE2*. In Latinos, nominal evidence of association (*p* < 0.05) and consistent directional effects were observed with 8 index SNPs at *SORT1*, *APOB*, *NOS3*, *LPL*, *ZHF259-APOA5-APOA1*, *MFGE8- ABHD2*, *FURIN-FES*, and *BCAS3*. In Japanese, only 1 index SNP at *9p21* was nominally significant and directionally consistent. In the combined multiethnic sample, 10 index SNPs at *PPAP2B*, *SORT1*, *IL6R*, *REST-NOA1*, *EDNRA*, *PHACTR1*, *BTN2A1*, *NOS3*, *9p21*, and *CXCL12* were directionally consistent and nominally statistically significant.

We observed evidence of regional replication for 6 regions in African Americans, 3 in Latinos, 1 in Japanese Americans, and 10 in the combined sample when examining SNPs correlated with the index SNPs (**Table S4**; see Methods). The previously reported index SNP in four of the 10 regions was not significant at the 0.05 level, but correlated SNPs with p-values smaller than the region specific significance levels were detected in these four regions: *SLC22A4- SLC22A5* and *RAI1-PEMT-RASD1* in African Americans, *TTC32- WDR35* in the multiethnic analysis, and *APOE-APOC1* in Latinos and the multiethnic sample.

The most statistically significant association was observed at the *SORT1* locus (**Figure 1**). Two index SNPs in complete LD (rs602633 and rs599839) were initially reported from GWAS in European ancestry populations. The index SNP rs602633 was associated with risk in African Americans (OR = 1.13; *p* = 0.004), Latinos (OR = 1.11; *p* = 0.04), and in the cross-ethnic meta-analysis (OR = 1.11; *p* = 7.8×10−4), but not in Japanese Americans (OR = 1.01, *p* = 0.88). The most significant association in the region was with variant rs3832016 (OR = 1.16; *p* = 2.5×10−5 in the multiethnic sample), an INDEL (−/T) in high LD with rs602633 in EUR (r<sup>2</sup> = 0.96) and with a MAF of 0.35 in African Americans, 0.20 in Latinos, and 0.07 in Japanese Americans. A previous fine-mapping study of the *SORT1* region at *1p13* implicated a nearby non-coding polymorphism (rs12740374) to be the likely functional variant and to affect lipoprotein metabolism (43). SNP rs12740374 is in high

LD not only with the index SNP rs602633 (r2 = 0.90 in EUR) but also with rs3832016 (r<sup>2</sup> = 0.94 in EUR). Variant rs12740374 was less strongly associated with risk in the current study (*p* = 0.008 in African Americans with MAF = 0.25, *p* = 0.08 in Latinos with MAF = 0.20, *p* = 0.93 in Japanese Americans with MAF = 0.07, and *p* = 0.003 in the combined multiethnic analysis).

Other regions where evidence of regional replication was observed in African Americans include *PPAP2B* (rs72664341, *p* = 0.00018), *SLC22A4-SLC22A5* (rs17689550, *p* = 0.006), *SLC22A3-LPAL2-LPA* (rs4709431, *p* = 0.0077), *SH2B3* (rs10774625, *p* = 0.0047) and *RAI1- PEMT-RASD1* (rs9899364, *p* = 4.5 × 10−4). In Latinos, evidence for regional replication was observed at *MFGE8-ABHD2* (rs8037001, *p* = 0.0017), *FURIN-FES* (rs8182016, *p* = 1.1 × 10−4), and *APOE-APOC1* (rs7412, *p* = 0.0043). In the Japanese, regional replication was only observed at *9p21* (rs10811656, *p* = 0.0015). Five of the 10 regions that replicated in the multiethnic analysis were also significant in ethnic-specific analyses, whereas the remaining 5 regions were detected with significant regional associations in one or more of the ethnic-specific populations alone (*TTC32-WDR35, APOB, EDNRA, PHACTR1* and *BCAS3*; **Table S4**).

Genetic risk scores (GRSs) were used to compare the distribution of genetic risk between populations. Japanese Americans carried, on average, more risk alleles (70.26 ± 4.61, mean ± SD) in comparison to African Americans and Latinos (67.03 ± 4.73 and 68.37 ± 5.11, respectively) (**Table 2**; **Table S5**). The greater number of risk alleles resulted in the distribution of the GRS to be shifted to the right in Japanese Americans compared to African Americans and Latinos (**Figure 2**). The distribution of the GRS was slightly higher in cases than in controls for every group (two-sided t-test, AA *p* = 4.4 × 10−5, LA *p* = 4.5 × 10−7, and JA *p* = 0.28). Only minor changes in the distribution of the GRS were noted when we included regionally significant leading SNPs from each ancestry (modified risk score I), or from the cross-ethnic meta-analysis (modified risk score II) (**Table S5**). The average risk scores remained highest in Japanese Americans whereas differences between African Americans and Latinos were reduced, especially when comparing CHD cases from these two ethnic groups (modified risk score II, *p* = 0.14).

The unweighted risk scores were statistically significantly associated with CHD risk in African Americans (per allele OR = 1.03, *p* = 4.1 × 10−5) and Latinos OR = 1.03, (*p* = 2.2 ×

### Table 2 | Associations of the genetic risk score with CHD by ethnicity


*\*Two-sided t-test*

*†Logistic regression model adjusted for age, gender, BMI, and the first 10 principal components*

*‡Risk score that includes ethnic-specific regional leading SNPs*

*§Risk score that includes cross-ethnic regional leading SNPs*

10−8), but only weakly associated with CHD risk in Japanese Americans (OR = 1.01, *p* = 0.11) (**Table 2**). When comparing individuals within GRSs in the top quartile to individuals in the bottom quartile, we found both African Americans (OR = 1.40) and Latinos (OR = 1.39) to have a statistically significant ~40% increase in risk (**Table 2**). The analogous risk was lower (~10%) and not significant in Japanese-Americans (OR = 1.09). Results were similar for the modified risk scores (**Table 2**).

To evaluate the effect of existing conditions on the results, we repeated the analysis excluding cancer or diabetes cases; the ORs were comparable to those observed in each ethnic group and in the entire sample (**Table S6**).

A sensitivity analysis was also performed on the selected index SNPs using controls refined to those with medical claims from Medicare or CHDD. Despite the loss of statistical power due to smaller sample size, the effect sizes were comparable to those observed when using the entire control sample (**Table S7**).

### Discussion

We evaluated 71 SNPs associated with CHD risk within 65 risk regions in a large multi-ethnic sample of African Americans, Latinos, and Japanese Americans and found that a statistically significant proportion of SNPs exhibited consistent directions of effect beyond the 50% expected by chance. However, only a subset of 11, 8, and 1 of these SNPs were found to be nominally statistically significant in African Americans, Latinos, and Japanese Americans, respectively. Exploration of common genetic variation in these CHD-associated regions provided additional support for association at 10 regions, with different ethnic-specific or crossethnic leading SNPs. These replication results provide additional evidence for shared common genetic effects across ethnicities, with previous studies only replicating signals at *9p21* (21, 24–26), *SORT1* (26), and *ABO* (26) in African Americans, and *SORT1*, *CXCL12*, and *9p21* in Latinos (29). This is the first report of *BTN2A1*, a region initially reported in Japanese, replicating in African Americans.

Japanese Americans had a higher GRS on average when compared to African Americans and Latinos. However, the GRS was more strongly associated with CHD in African Americans and Latinos compared to the Japanese Americans. We note that the genetic markers reported from previous discovery efforts are unlikely to be the functional alleles. The correlation between the index and functional SNPs may vary depending on the LD structure of each ancestral group, which may contribute to the difference in the ethnicspecific odds ratios. In addition to having limited statistical power to replicate associations with index SNPs within and across these populations, differences in LD may serve as an alternative explanation for the lack of replication. In an attempt to address such issues, we conducted regional association testing and constructed modified risk scores incorporating regional association results. When substituting the index SNPs with leading SNPs from the regional analyses, the differences in the modified risk score distributions and per-allele aggregate effects were only modified slightly, but differences were still noted, particularly between the Japanese and the other populations. The reasons for these differences are unclear. Our findings may reflect the severity of subclinical coronary atherosclerosis among Japanese participants in the MEC that is on average greater than the severity observed in Africans and Hispanics (44). Although our analyses are

preliminary, we deem it unlikely that these known risk alleles are major contributors to race/ethnic differences in the incidence of CHD, as the incidence of CHD in Japanese is lower than that in the other two groups (45). Given that Japanese had a higher average GRS compared to other ethnic groups, but their population risk is lower, it is possible that functional variants within CHD susceptibility genes not included in our GRS disproportionally affect non-Japanese race/ethnic groups. Alternatively, environmental risk factors such as suboptimal diet and smoking may be less prevalent in Japanese and primarily responsible for the lower rates of CHD despite the higher genetic risk. It is difficult to directly compare the GRS distribution reported in this study to those in studies in European ancestry populations as the methods and number of selected SNPs vary (46–53). The vast majority of studies in European ancestry populations have observed statistically significant per allele relative risks of 1.02–1.12 and relative risks of 1.5–1.9 in comparing the highest versus lowest quintile or quartile of the GRS. Our findings in African Americans and Latinos are generally consistent with these reports albeit smaller effect sizes were noted, perhaps due to differences in LD between the index and functional SNPs.

Our study has a number of limitations. First, the information used to define CHD cases and controls was based on a combination of health care claims data as well as self-report on questionnaires. Some of the Japanese cases from Hawaii may have been missed due to the lack of CHDD records. Of the 1,089 Japanese participants whose CHDD records were available (in California), 426 CHD cases were identified, with 103 classified as cases based solely on CHDD records. Given the same ratio, about 338 Japanese CHD cases from Hawaii where CHDD was not available, may have been misclassified as controls. Assuming an equal distribution of genotypes in these missed cases compared to recognized cases, this misclassification would result in effects being biased towards the null and a reduced power to detect associations. Similar misclassification may apply as Medicare or CHDD data were not available for all controls. In the sensitivity analysis, limiting controls to those with claims data, fewer SNPs reached nominal statistical significance (0.05) however effect sizes were relatively comparable to those observed in the entire control sample. In an attempt to increase specificity when using Medicare and CHDD claims, we only included CHD cases identified from the primary and the first diagnosis codes, with individuals identified with CHD beyond the primary diagnosis excluded from being a case or a control. The validity of our case and control definitions is indirectly supported by the observed associations of case-control status with known risk factors and by the detection of more directionally consistent genetic associations than expected. Another limitation is the potential inclusion of inappropriately labeled CHD deaths, as CHD is often reported on death certificates when the cause of death is unclear. However, of the 1,005 CHD deaths, 718 also had prior claims data from CHDD or Medicare (**Figure S2**), suggesting a high consistency between mortality records and health care claims.

Another limitation of the study was the selection of CHD cases and controls among MEC participants conditional on three existing medical conditions. However, in sensitivity analyses limited to those without existing conditions, we observed robust consistency in terms of effect size and effect direction between this subset and the entire sample.

We present the largest replication study of established GWAS loci for CHD in Latinos and Japanese Americans conducted to date. However, our power was still limited to detect the originally reported effect sizes even in the combined multiethnic sample. We observed a higher GRS in Japanese Americans compared with African Americans or Latinos but the GRS was paradoxically not significantly associated with CHD risk in Japanese Americans despite observing strong associations in the other two groups. Substantially larger samples that include multiple racial/ethnic groups will help to identify the functional alleles in these regions and characterize their associations with CHD risk and contributions to CHD disparities among ethnically diverse populations.

### Ethics Statement

The MEC study obtained informed consent from study participants and approval from the Health Science Review Board (HSIRB) at the University of Southern California and obtained IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association studies (GWAS).

### References


### Author Contributions

WK conducted the analysis and wrote the paper. KR contributed to the analysis. DC contributed to the analysis. VS contributed data for the manuscript. DS contributed to the analysis of the manuscript. LW contributed data for the manuscript. LL contributed data for the manuscript. TA contributed data and assisted with the writing of the manuscript. CH oversaw the project, contributed to the data, analysis and writing of the manuscript.

### Funding

This study was funded by the National Cancer Institute Grant Number 2U01CA164973- 06 and NHGRI grant U01 HG007397

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fcvm.2018.00019/ full#supplementary-material

Table S1 | Genetic arrays by studies.

Table S2 | Detailed demographic information on participants by ethnicity, gender, and existing conditions.

Table S3 | Associations with CHD SNPs in the MEC populations.


Table S6 | Sensitivity analysis: comparing subpopulation with/without existing conditions.

Table S7 | Sensitivity analysis: associations with CHD SNPs using controls with claims.


offspring, studies. *Atherosclerosis* (2012) 223(2):421–6. doi: 10.1016/j. atherosclerosis.2012.05.035

53. Ganna A, Magnusson PK, Pedersen NL, de Faire U, Reilly M, Arnlöv J, et al. Multilocus genetic risk scores for coronary heart disease prediction. *Arterioscler Thromb Vasc Biol* (2013) 33(9):2267–72. doi: 10.1161/ATVBAHA.113.301218

**Conflict of Interest Statement:** Author KR is currently employed by company Ancestry.com. All other authors declare no competing interests.

The reviewer IB and handling Editor declared their shared affiliation.

*Copyright © 2018 Ke, Rand, Conti, Setiawan, Stram, Wilkens, LeMarchand, Assimes and Haiman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Integrative Bioinformatics Approaches for Identification of Drug Targets in Hypertension

*Daiane Hemerich 1,2, Jessica van Setten 1, Vinicius Tragante 1 and Folkert W. Asselbergs 1,3,4,5\**

*1 Department of Cardiology, University Medical Center Utrecht, University of Utrecht, Utrecht, Netherlands, 2 CAPES Foundation, Ministry of Education of Brazil, Brasília, Brazil, 3 Durrer Center for Cardiovascular Research, Netherlands Heart Institute, Utrecht, Netherlands, 4 Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, United Kingdom, 5 Farr Institute of Health Informatics Research and Institute of Health Informatics, University College London, London, United Kingdom*

### *Edited by:*

*Jeanette Erdmann, University of Lübeck, Germany*

### *Reviewed by:*

*Pallavi R. Devchand, Icahn School of Medicine at Mount Sinai, United States Melanie Boerries, Deutsches Krebsforschungszentrum (DKFZ), Germany Yuqi Zhao, University of California, Los Angeles, United States*

*\*Correspondence:*

*Folkert W. Asselbergs F.W.Asselbergs@umcutrecht.nl*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 19 January 2018 Accepted: 12 March 2018 Published: 04 April 2018*

### *Citation:*

*Hemerich D, van Setten J, Tragante V and Asselbergs FW. (2018) Integrative Bioinformatics Approaches for Identification of Drug Targets in Hypertension. Front. Cardiovasc. Med. 5:25. doi: 10.3389/fcvm.2018.00025*

High blood pressure or hypertension is an established risk factor for a myriad of cardiovascular diseases. Genome-wide association studies have successfully found over nine hundred loci that contribute to blood pressure. However, the mechanisms through which these loci contribute to disease are still relatively undetermined as less than 10% of hypertension-associated variants are located in coding regions. Phenotypic cell-type specificity analyses and expression quantitative trait loci show predominant vascular and cardiac tissue involvement for blood pressure-associated variants. Maps of chromosomal conformation and expression quantitative trait loci (eQTL) in critical tissues identified 2,424 genes interacting with blood pressure-associated loci, of which 517 are druggable. Integrating genome, regulome and transcriptome information in relevant cell-types could help to functionally annotate blood pressure associated loci and identify drug targets.

Keywords: hypertension, blood pressure, epigenetic regulation, GWAS, data integration, functional annotation, drug target identification.

# Introduction

Elevated blood pressure (BP) or hypertension is a heritable chronic disorder (1–3), considered the single largest contributing risk factor in disease burden and premature mortality (4). High systolic and/or diastolic BP reflects a higher risk of cardiovascular diseases (4). Genome-wide association studies (GWAS) have found association of 905 loci to BP traits (systolic - SBP, diastolic - DBP and pulse pressure -PP) to date (**Table S1**) (5–33). The use of larger sample sizes has helped to identify additional variants, as demonstrated by the most recent study including over 1 million people that has identified 535 novel BP loci (33). Still, this collective effort thus far has not entirely elucidated the complete genetic contribution to BP, estimated to be approximately 50–60% (34).

To add to this complexity, 90.7% of the 905 BP-associated index variants are located in intronic or intergenic regions (**Table S1**). Causal variants are also difficult to pinpoint because of linkage disequilibrium (LD) (35). There is now vast evidence that non-coding variants associated with disease interrupt the action of regulatory elements crucial in relevant tissues for that particular disease (36). BP loci are not only linked to cardiovascular disease but also to other diseases (**Figure 1**), suggesting that BP-associated variants can result in a wide range of phenotypes. Tissue specificity of genetic loci may be relevant for organ specific disease progression. For example, variants altering expression in heart may more likely affect disease progression through heart-

different traits. Beige: body measurements (height, body mass index (BMI), weight, waist/hip ratio, hip circumference, waist circumference. *N* = 358). Red: lipids (high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides, total cholesterol. *N* = 226). Yellow: coronary artery disease (CAD)/myocardial infarction (MI) (*N* = 206). Blue: schizophrenia (*N* = 135). Orange: years of education attendance (*N* = 101). Light green: creatinine (*N* = 88). Light pink: rheumatoid arthritis (*N* = 78). Purple: type II diabetes (*N* = 73). Light turquoise: neuroticism (*N* = 69). Light grey: Crohn's disease (*N* = 67).

mediated processes rather than kidney-mediated processes, and some patients may suffer of left ventricular hypertrophy while others may develop nephropathy. Thus, investigating the influence of BP variants in critical cell-types is essential in understanding disease risk and biology, and assessing the possible translation of an associated locus into a drug target. The public availability of regulatory annotations in several tissues by projects such as ENCODE (39), Roadmap (40) and GTEx (41, 42) has enabled integration of epigenetic modifications, expression quantitative trait loci (eQTLs) and –omics information with GWAS data. Integrative approaches are useful for prioritizing genes from known GWAS loci for functional follow-up, detecting novel

gene-trait associations, inferring the directions of associations, and potential druggability (43–46).

Here we summarize the advances made in recent years towards unraveling the mechanisms of non-coding BP variants in disease progression with the resources mentioned above. We focus on integrative approaches that aim to prioritize BP-associated SNPs located in regulatory regions of the genome for follow-up studies (**Figure 2**). Genetic and molecular aspects of hypertension have been reviewed previously by others (47, 48).

### Integrative Approaches Using – Omics Data

Remarkable advances have been made recently towards a better comprehension of BP genetics, the biology of disease and translation towards new therapeutics, boosted by the widespread application of high-throughput genotyping technologies. At the same time, most BP-associated variants are non-coding, making the conversion of statistical associations into target genes a great challenge. SIFT (49, 50), PROVEAN (51), PolyPhen (52), CONDEL (53) and more recently CADD (54) are scoring algorithms developed for predicting the effect of amino acid changes. Only 98 out of the 905 lead BP-associated SNPs reflect a CADD score above 12.37 (**Table S2**), a threshold suggested by Kicher et al. as deleterious (54). However, the causal variant inside the locus might reflect a different CADD score than the lead SNP, and pinpointing the mechanisms disturbed by the variation remains a challenge.

New strategies that make use of regulatory annotations in disease-relevant tissues have greatly expanded our ability to investigate the processes involved in BP. In particular, annotation of histone modifications and regions of open chromatin allow the identification of active transcription in specific-cell types. Similarly, maps of DNA variants affecting expression in a celltype specific manner will be integral in BP loci interpretation. A list of cardiovascular-related cell-types researched by the ENCODE Project is presented by Munroe et al. (55). Such data can be integrated with GWAS results using bioinformatics tools (56–58). For instance, FUMA provides extensive functional annotation for all SNPs in associated loci and annotates the identified genes in biological context (57). FunciSNP investigates functional SNPs in regulatory regions of interest (58). Ensemble's Variant Effect Predictor (VEP) determines the effect of variants on genes, transcripts, and protein sequence, as well as regulatory regions, also outputting SIFT, Polyphen and CADD scores for each variant, among other information (59). Although such integrative tools are useful for variant prioritization and interpretation, not all take into consideration tissue specificity aspects. RegulomeDB, for example, is a database that annotates SNPs with known and predicted regulatory elements in the intergenic regions of the human genome, calculating a score that reflects its evidence for regulatory potential (60). However, the scoring procedure can only be performed across all available tissue types. In addition, several databases containing a broad range of tissues were made publicly available since the last update of RegulomeDB, that could be included in the tool. Together, these resources have been useful in prioritizing genes and variants in associated loci for functional follow-up experiments in many post-GWAS analyses, and can be implemented in interpretation of BP-associated loci.

### Transcription Regulation: Histone Modifications and Open Chromatin

As genomic coordinates of active regulatory elements may be mapped using unique functions of chromatin, the characterization of chromatin changes in the genome in specific cell-types can be used to identify DNA variants disturbing active regulatory elements. The four core chromatin histones, H2A, H2B, H3 and H4, can suffer posttranslational modifications, such as acetylation or methylation (61). These histone modifications indicate active (euchromatin) or repressed (heterochromatin) chromatin structure, defining regulation and gene transcription (62, 63). Acetylation of histones H3 and H4, and H3 methylation at Lys4 (H3K4me3), for instance, correlate with gene transcription, whereas methylation at Lys9 correlates with gene silencing (62, 64). These modifications provide a robust readout of active regulatory positions in the genome, and have been employed for annotation in several studies (23). Histone modifications influencing arterial pressure have been observed in many tissues, including vascular smooth muscle (65). An updated phenotypic cell-type specificity analysis of the 905 BP loci using H3K4me3 mark in 125 tissues is shown in **Figure 3**. The most significant cell-types are cardiovascular-related (**Supplemental Methods**, **Table S3**). Other tissues with high rank in specificity are smooth muscle, fetal adrenal gland, embryonic kidney cells, CD34 and stem-cell derived CD56 +mesoderm cultured cells. These results are consistent with analyses using DNase I hypersensitivity sites (DHSs), which indicate likely binding sites of transcription factors. These results add more evidence that BP loci are enriched on regions of open chromatin (19, 20, 23, 33) (**Figure S1**), regulating transcription in a broad range of tissues.

### Methylation

In addition to histone modifications that promote transcription, BP loci have also been studied for their enrichment on DNA methylation, known to have the opposite regulatory effect. The

methylation of CpG sites, presented by CpG islands in promoters, affects binding of transcription factors, resulting in gene silencing (66, 67). Abnormal CpG methylation is found in hypertension (68–70), and in many other complex diseases (71, 72). Recently, Kato et al. identified a ~2 fold enrichment associating BP variants and local DNA methylation (19). The study also demonstrates that DNA methylation in blood correlates with methylation in several other tissues. These observations add to previous indications on the function of DNA methylation in regulating BP.

### Measuring the Impact of BP Risk Alleles on Gene Expression: eQTLs

Expression quantitative trait loci (eQTL) are regions harbouring nucleotides correlating with alterations in gene expression (73). Linking transcription levels to complex traits has been a follow-up step adopted by many studies (43, 74–76), driven by the increase in available data of expression patterns across tissues and populations (33, 46, 77–81). Warren et al. found that 55.1% of their identified BP-associated loci have SNPs with eQTLs in at least one tissue from GTex repository (41), with arterial tissue most frequently observed (29.9% of loci had eQTL in aorta and/or tibial artery) (21). A great enrichment of eQTLs in artery was also observed by Evangelou et al., who identified 92 novel loci with eQTL enrichment in arterial tissue and 48 in adrenal tissue (33). In summary, these studies also suggest that BP loci exert a regulatory effect mostly in vascular and cardiac tissues.

### Finding the Targets: Chromosome Confirmation Capture Techniques

Mapping variation to target genes is one of the greatest challenges in the post-GWAS era, and different strategies have been developed to this end (82). One approach is the use of chromosome confirmation capture [3C (83), 4C (84, 85), Hi-C (86, 87)]. These techniques capture chromosome interactions (88), resulting in networks of interacting genetic loci (84, 85).

Warren et al. made use of this resource to investigate the target genes of non-coding SNPs, using Hi-C data from endothelial cells (HUVECs). Distal potential genes were found on 21 loci, and these genes were enriched for regulators of cardiac hypertrophy in pathway analysis (20). Kraja et al. also explored long-range chromatin interactions using endothelial precursor cell Hi-C data (89, 90), finding the link between an associated loci and a gene known to affect cell growth and death (91). More recently, Evangelou et al. used chromatin interaction Hi-C data from HUVECs (92), neural progenitor cells (NPC), mesenchymal stem cells (MSC) and tissue from the aorta and adrenal gland (93) to identify distal affected genes. They found 498 novel loci that contained a potential regulatory SNP, and in 484 loci long-range interactions were found in at least one cell-type (33).

A list of human HiC data available on BP relevant tissues is presented in **Table S4**. An updated version of variant to gene mapping making use of this chromatin conformation data is shown in **Table S5**. Promoter regions of 1,941 genes were found to interact with the 27,649 candidate SNPs (905 BP associated SNPs and vicinity) (Supplemental Methods, **Figure 4**). Integration with eQTL data on relevant tissues confirmed 209 of the genes mapped, and added additional 483 genes. One main goal of understanding biological mechanisms of GWAS associations and affected genes is to be able to therapeutically target them. Assessment of the druggability of a BP-associated locus depends on several factors, but overlap of these results with a recent effort on druggability suggests that 517 of these 2,424 genes are druggable (94), and 35 mapped genes are also predicted to interact with common drugs for treatment of hypertension (**Table S2**, **Figure 4**, **Supplemental Methods**). Interestingly, 1,774 of the genes mapped are physically located outside BP-associated loci. These results support the hypothesis that BP GWAS loci act on tissue specific regulatory gene networks. Importantly, they also show that the use of long range chromatin interaction maps can reliably identify target genes even outside the risk locus.

# Discussion and Conclusions

GWAS have pinpointed over 900 loci associated with BP, and increasing sample size has shown to be crucial to identify more signals (33). However, efforts are needed to translate these results into biological inferences on causal mechanisms and understanding of disease biology. The integration of data beyond the DNA sequence is crucial to identify genes involved in BP regulated by epigenetic mechanisms.

BP variants show eQTL, histone modification and open chromatin enrichment in a broad range of tissues, mostly vascular and cardiac-related. As the interplay of regulatory elements is highly cell-type specific, the study of changes that influence chromatin structure and accessibility needs to be extended to a broad range of tissues and conditions, including disease and its stages. Rosa-Garrido et al. observed chromatin structural abnormalities when comparing healthy and diseased cardiac myocytes, concluding that heart failure involves altered enhancer-gene interactions (95). Thus, alterations in chromatin structure underlying heart disease perturbs significant interactions that contribute to gene expression. This finding suggests that high resolution chromatin conformation and epigenetic data in disease state can help in understanding how regulatory variants confer risk to disease. The availability of data in different populations will also allow fine-mapping and functional annotation across ethnic groups.

By mapping of BP-associated variants to genes using maps of chromosomal conformation in specific cell-types, we identified 1,941 genes, of which 209 show supported by eQTL mapping. Of all genes mapped (*n* = 2,424), 517 are predicted as druggable and 35 are predicted to interact with common antihypertensive drugs. These include successful cases such as *APOB* gene, predicted to be targeted by Ibersartan, an angiotensin II receptor antagonist used mainly for the treatment of hypertension (96). Interestingly, in this analysis we were also able to identify *ABCC9* gene on both eQTL and HiC mapping, a gene that interacts with Minoxidil. Although originally developed as an antihypertensive vasodilator, side effects provided limitations and currently its main application occurs topically for treatment of hair loss (97, 98). This highlights the several factors involved in druggability of a target and need for extensive validation and trials. With *in-silico* experimental evidence supporting a plausible mechanism for association, definitive assignment of functions to putative cisregulatory elements requires perturbation of these elements. Although the majority of associated variants add only modest effects on risk, more studies suggest combinations of SNPs are frequently necessary in order to explain these effects (99– 101). CRISPR–Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats) editing technology (102) permits targeted manipulation of epigenetic mechanisms linked to risk alleles (103). Finally, genes that show consequent differential expression can be further validated *in vivo* with the use of animal models.

### References


In summary, the integrative approaches presented in this review help understanding the underlying biology of GWAS loci by mapping SNPs to genes and determine cell and tissuespecificity. The increase in availability of regulatory data in a broad range of tissues and disease states will expand the possibilities for integration and interpretation of association results. Studies validating the genes prioritized may identify new drug targets, enabling more effective prevention and treatment of hypertension and its consequences.

### Author Contributions

DH, VT and FA contributed in study conception and design. DH was responsible for analysis and interpretation of data and drafting of manuscript. DH, VT, JS and FA provided critical revision and final approval of the manuscript.

### Funding

FA is supported by a Dekker scholarship (Junior Staff Member 2014T001, Dutch Heart Foundation) and UCL Hospitals NIHR Biomedical Research Centre. Part of this work is funded through the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement no 116074, BigData@Heart.

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fcvm.2018.00025/ full#supplementary-material


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Hemerich, van Setten, Tragante and Asselbergs. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Roles of the Chr.9p21.3 *ANRIL* Locus in Regulating Inflammation and Implications for Anti-Inflammatory Drug Target Identification

*Ghazal Aarabi 1, Tanja Zeller 2,3, Guido Heydecke 1, Matthias Munz 4,5,6, Arne Schäfer S 4 and Udo Seedorf 1\**

*1 Department of Prosthetic Dentistry, Center for Dental and Oral Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany, 2 Department of General and Interventional Cardiology, University Heart Center Hamburg (UHZ), University Medical Center Hamburg-Eppendorf, Hamburg, Germany, 3 Deutsches Zentrum für Herz-Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, Hamburg, Germany, 4 Center of Dento-Maxillo-Facial Medicine, Department of Periodontology and Synoptic Dentistry, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany, 5 Institute for Cardiogenetics, University of Lübeck, Lübeck, Germany, 6 University Heart Center Lübeck, Lübeck, Germany*

### *Edited by:*

*Mete Civelek, University of Virginia, United States*

### *Reviewed by:*

*Tom Robert Webb, University of Leicester, United Kingdom Yuqi Zhao, University of California, Los Angeles, United States*

> *\*Correspondence: Udo Seedorf u.seedorf@uke.de*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 08 March 2018 Accepted: 01 May 2018 Published: 18 May 2018*

### *Citation:*

*Aarabi G, Zeller T, Heydecke G, Munz M, Schäfer A S and Seedorf U (2018) Roles of the Chr.9p21.3*  ANRIL *Locus in Regulating Inflammation and Implications for Anti-Inflammatory Drug Target Identification. Front. Cardiovasc. Med. 5:47. doi: 10.3389/fcvm.2018.00047*

Periodontitis (PD) is a common gingival infectious disease caused by an over-aggressive inflammatory reaction to dysbiosis of the oral microbiome. The disease induces a profound systemic inflammatory host response, that triggers endothelial dysfunction and pro-thrombosis and thus may aggravate atherosclerotic vascular disease and its clinical complications. Recently, a risk haplotype at the *ANRIL*/*CDKN2B-AS1* locus on chromosome 9p21.3, that is not only associated with coronary artery disease / myocardial infarction (CAD/MI) but also with PD, could be identified by genome-wide association studies. The locus encodes ANRIL - a long non-coding RNA (lncRNA) which, like other lncRNAs, regulates genome methylation via interacting with specific DNA sequences and proteins, such as DNA methyltranferases and polycomb proteins, thereby affecting expression of multiple genes by *cis* and *trans* mechanisms. Here, we describe ANRIL regulated genes and metabolic pathways and discuss implications of the findings for target identification of drugs with potentially anti-inflammatory activity in general.

Keywords: periodontitis, inflammation, ANRIL, 9p21.3, drug target, anti-inflammatory agents, coronary artery disease, CDKN2B-AS1

# Introduction

Periodontitis (PD) is an inflammatory disease that involves the osseous, connective, and epithelial, tissues surrounding the teeth (1). Bacteria attached to the teeth along the gingival margin form a biofilm, which may trigger an immune response in the adjacent gingival tissue. If the biofilm is not removed and persists, it can induce gingivitis characterized by swelling, redness and bleeding (2). If the bacterial biofilm and the accompanying inflammatory reaction migrate apically along the root surface and penetrate into the tooth supporting structures the gingival inflammation becomes PD (3), which exists in two forms, chronic periodontitis (CP) and a more severe, early onset form called aggressive periodontitis (AgP) (4). In the US almost 50% of adults aged 30 years or above have CP, including 30% with moderate and 8.5% with severe PD (5). Compared with CP, AgP is less frequent (prevalence: <0.1%). PD is a complex inflammatory disease, which is influenced considerably by interactions between environmental, lifestyle and genetic factors. Some individuals develop PD at young age, although they have similar lifestyle habits and environmental context compared to individuals who do not develop the disease. Therefore, it is considered that early-age of disease onset often indicates a genetic predisposition (6). The genetic susceptibility to PD has been examined extensively by GWAS (7–10) and seven common variants were identified, three of which met the genome-wide significance thresholds. Of the latter three, one (GLT6D1, glycosyltransferase 6 domain containing 1) is specific for AgP, whereas the other two (SIGLEC5, sialic acid binding Ig like lectin 5; DEFA1A3, defensin alpha 1/alpha 3) are associated with both AgP and CP (8, 10, 11). However, to date no associations that met the genome-wide significance threshold for common and rare alleles could be identified for CP alone. It is considered that these not signnificant findings are caused by the small sample sizes that were employed. Yet, some loci give suggestive evidence for association with PD. This evidence is based on independent replication in samples of the same disease phenotype with sufficient statistical power, independent validation of the associations in samples of different disease manifestations, like AgP and CP, and independent identification through different unbiased systematic approaches. According to these criteria, the following loci in addition to *GLT6D1*, *SIGLEC5* and *DEFA1A3* may currently be considered to be associated with CP and/or AgP: *ANRIL* (antisense noncoding RNA in the INK4 locus), *NPY* (neuropeptide Y), *PF4* (platelet factor 4), *PLG* (plasminogen), *VAMP3* (vesicle associated membrane protein 3) (10, 12–20).

Results obtained from longitudinal epidemiological studies support that CAD and CP are associated with each other (21), although the causative relationship between CAD and CP has remained ambiguous (22). Interestingly, variants at *ANRIL*, *PLG* and *VAMP3* were reported to be associated with periodontal phenotypes and also with CAD [recently reviewed in ref. (23)]. Of these, *ANRIL* is the most significant risk locus of CAD and the association of *ANRIL* with PD was replicated repeatedly. In this narrative review, we summarize recent publications on the impact of this locus on chronic inflammation and to discuss potential approaches and strategies to identify new drug targets related to anti-inflammatory therapies in general.

# The Chr.9p21.3 Risk Region Is Shared Between Periodontitis and CAD/MI and Affects Gene Expression of Multiple Genes in Different Cell Types

The 9p21.3 risk haplotype at *ANRIL/CDKN2B-AS1* had initially been identified by GWAS of CAD (24), and was shortly later identified by Schaefer et al. as one of the first genetic risk factors of AgP (17, 25–27) [see (**Table 1**) for a comparison of the association statistics of the relevant 9p21.3 lead SNPs related to AgP and coronary heart disease].

The core risk haplotype of ~50 kb, that is shared between CAD/ MI and PD encodes the 3'end of a long ncRNA called "antisense non-coding RNA in the INK4 locus (*ANRIL*)" (also designated CDKN2BAS) (17, 25). Its sequence is oriented antisense relative to cyclin-dependent kinase inhibitor 2B (*CDKN2B*), which is located adjacent to the core CAD/PD region. Together with *CDKN2A*, which is located further upstream of *ANRIL*, this region harbors a hotspot for multiple complex human diseases and traits (28). Adjacent is a tightly linked locus for diabetes (29) which is neither associated with CAD (29) nor PD (17).

Given the extended region of high linkage disequilibrium at the 9p21.3 locus and the large number of transcriptional regulatory elements that are present in the CAD risk region, it is currently not entirely clear whether the risk of CAD and PD is mediated solely by ANRIL or whether its neighbors, CDKN2B and CDKN2A - two well-known tumor suppressor genes involved in cell cycle arrest and malignant transformation in certain cancers (30) - contribute to the mechanism. Knockout mice lacking CDKN2B do not only develop a cancer-related phenotype but also advanced aneurysms, accelerated smooth muscle cell apoptosis and medial arterial thinning (31), suggesting a potential involvement of CDKN2B not only in cancer but also in vascular disease. CAD risk SNP rs1537373 affects CDKN2B expression in human coronary artery smooth muscle cells, aorta and the mammary artery (32), and CDKN2B has been shown to regulate inflammatory cytokine production and the clearance of smooth muscle cell-derived apoptotic bodies during atherosclerosis (33). Miller et al. (32) recently investigated the role of SNP rs1537373 in the expression of *ANRIL*. This variant resides in a large haplotype block of linked variants including the highly replicated CAD SNP, rs4977574 and the CAD and PD lead SNP rs1333049 (17, 34). Although rs1537373 does not affect a known transcription factor binding motif, it is located at


*Association statistics of tree haplotype tagging SNPs at the relevant chromosome 9p21.3 risk region, multiplicative model adjusted for smoking, diabetes, and gender in a logistic regression model. AgP: aggressive periodontitis (generalized), CHD, coronary heart disease (disease onset <55 years), OR: odds ratio, CI: confidence interval, P: P-value obtained from a Wald test, N: number of cases/controls. Data extracted from ref. (17).*

a site of accessible chromatin. Allele-specific transcription factor binding and histone H3 lysine 27 acetylation around rs1537373 indicated that the native chromatin structure may be affected by the genotype, which was consistent with the observed *cis* eQTL affecting CDKN2B rather than ANRIL in aortic tissues (32). It appears noteworthy in this context that SNP rs1537373 was earlier demonstrated to be also strongly associated with coronary artery calcification (35). If bone marrow lacking murine Cdkn2a was transplanted to the atherosclerosis prone Ldlr(-/-) mouse model, the Cdkn2a-deficient recipients exhibited accelerated atherosclerosis, a higher number of pro-inflammatory monocytes, and increased monocyte/macrophage proliferation compared to controls (36). Thus besides CDKN2B, also CDKN2A has some plausibility for being involved in the pathogenesis of vascular inflammation [see the review by Hannou et al. (37) for further information].

The location of the core risk haplotype of CAD/MI and PD at the 3'end of *ANRIL* implies that the encoded long ncRNA is a prime functional candidate involved in the risk mediating mechanism(s). *ANRIL* is a lowly expressed gene consisting of 20 exons whose transcripts could be detected in a wide variety of cell-types and tissues, including smooth muscle cells, endothelial cells, and cells of the immune system that are known to be involved in atherogenesis (29, 38, 39). Originally, two splice variants were demonstrated in normal human testis and signals using PCR with primers derived from exons 14–16 were also obtained in a range of other tissues (40). Subsequently, many additional splice variants could be identified in various cell-types (38, 41, 42). ANRIL is subject to a complex pathway of alternative splicing which may differ from tissue to tissue and which may be influenced by the presence of SNPs interfering with the function of splice signals.

ANRIL expression was reported to be tightly linked to the *ANRIL* genotype due to disruption of an inhibitory STAT1 binding site in risk allele carriers (43), which would be expected to impair the IFNγ signaling response. However, results published by Almontashiri et al. argued against an involvement of IFNγ in the mechanism underlying the association of the 9p21.3 genotype with CAD risk (44). The CAD risk allele of SNP rs564398, which is one of the SNPs most strongly correlated with ANRIL expression, was predicted to disrupt a Ras Responsive Element Binding protein (RREB) 1 binding site in the 9p21.3 locus (45, 46). RREB may be involved in up-regulating CDKN2B in a Ras-dependent manner by down-regulating ANRIL. Besides stimulating VSMC senescence, Ras has also been implicated to contribute to atherogenesis by affecting vascular inflammation (47). The local functional influence of variants in the 9p21.3 region on gene expression has been examined by many other studies in a variety of tissues and cells (41, 45, 48–52). The results confirmed that the CAD risk variants in the 9p21.3 region are strongly associated with *ANRIL* expression and also with expression of the adjacent loci (*CDKN2A*, *CDKN2B*), albeit much more moderately. However, there is some inconsistency concerning the direction of the effect. Earlier studies suggested associations between CAD risk variants and lower *ANRIL* expression in vascular smooth muscle cells, whole blood cells and purified peripheral blood T-cells (49, 53, 54). In contrast, the study by Holdt et al. (51), in which specifically the long ANRIL transcript (ENST00000428597) was measured, demonstrated that the CAD risk haplotype was associated with higher *ANRIL* expression in whole blood cells and peripheral blood mononuclear cells. Also Zhao et al. found higher expression of this transcript in transformed beta-lymphocytes collected from genotyped donors who carried the CAD risk variant rs7865618 (55). In the latter study, all CAD risk variants assayed in the study were associated with the same directions of the effects.

In addition to the linear form of ANRIL, there also exists a circular ANRIL RNA form (38). Recently, Holdt et al. (56) showed that circular ANRIL may be athero-protective by regulating rRNA maturation. In their model, pescadillo homologue 1 (PES1, a 60S-preribosomal assembly factor) binds to circular ANRIL, which impairs ribosome biogenesis and exonuclease-mediated pre-rRNA processing. The resulting nucleolar stress induces activation of p53, which triggers apoptosis and inhibits proliferation, thereby preventing the accumulation of vascular smooth muscle cells and foam cells at the sites of the atherosclerotic lesion. The balance between atherogenic linear and athero-protective circular ANRIL may be critical for the impact of ANRIL on disease progression. Conversely, a recently published study came to the opposite conclusion, namely that circular ANRIL may be pro-atherogenic (57). In this study, circular antisense ANRIL was used to investigate the inflammatory response of vascular endothelial cells *in vivo* in a rat model of coronary atherosclerosis which was established by injecting rats on a high fat diet with vitamin D3 (57). Circular antisense ANRIL lowered circular ANRIL in vascular endothelial cells along with the levels of several pro-atherogenic markers (serum cholesterol, triglycerides, LDL, IL-1, IL-6, MMP-9, CRP, cANRIL, Bax, caspase-3) and the rates of endothelial cell apoptosis, while HDL levels and bcl-2 expression were increased. In contrast, induction of circular ANRIL expression promoted atherosclerosis by increasing pro-inflammatory properties in vascular endothelial cells and by raising serum lipid and pro-inflammatory cytokine levels. These results were consistent with the hypothesis, that inhibiting circular ANRIL expression would be anti-inflammatory and would reduce vascular endothelial cell apoptosis, which in turn would protect against atherosclerosis in this animal model.

In earlier studies, it could be demonstrated that the epigenetic silencer polycomb repressive complexes 1 and 2 (PRC1 and PRC2) and PRC-associated activating proteins RYBP and YY1 can bind to ANRIL (58, 59), suggesting that ANRIL may be able to modulate epigenetic regulation of target gene expression in *cis* and *trans*. It could be demonstrated *in vitro* by inducible knock-down approaches in T-Rex 293 HEK cells that silencing of two proximal *ANRIL* transcripts altered expression of *ADIPOR1*, *VAMP3* and *TMEM258* (60) (see **Table 2** for a list of genes regulated by *ANRIL*). ADIPOR1 is a high-affinity receptor for globular adiponectin, which is involved, amongst others, in PPARα (peroxisome proliferator activated receptor alpha) and AMPK (AMP-activated protein kinase) signaling (62). PPARα activation could prevent experimentally induced bone-loss in animal studies (63). AMPK and PPARα act as key regulators of glucose and fatty acid metabolism in the liver. Adiponectin levels are inversely correlated with BMI, body fat and severity of CAD (64). Globular adiponectin also increases insulin sensitivity by stimulating cellular glucose uptake via increasing recruitment of glucose transporter 4 (GLUT4) to the plasma membrane and inducing *GLUT4* expression (65). Besides these

### Table 2 | *ANRIL*-Regulated Genes


*CAD, coronary artery disease; CVD, cardiovascular disease; HUVEC, human umbilical vein endothelial cells; IL, interleukin; PBMC, peripheral blood mononuclear cells; VSMC, vascular smooth muscle cells*

metabolic roles, adiponectin also has anti-inflammatory activity by activating tissue inhibitors of metalloproteinases, IL-10, and by suppressing lipopolysaccharide-activated *TNF* (tumor necrosis factor) expression and phagocytic activity (66, 67). The effect of ANRIL on *VAMP3* expression (**Table 2**) may be important, because VAMP3 belongs to the VAMP/synaptobrevin family involved in phagocytosis and trafficking of TNF-α-containing secretory vesicles to the cell surface required for TNF-α secretion (68).

Genome-wide *cis* and *trans* effects of the variants in the 9p21.3 region on gene expression were recently studied by Zhao et al. (55), who employed the SNP-set (Sequence) Kernel Association Test [SKAT, (69)] on genotyped transformed beta-lymphocytes collected from 801 participants from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. The results demonstrated a significant association between the CAD and PD risk variants in the region with the expression of the long linear *ANRIL* transcript containing the coding information of all 20 exons except exon 13. In addition to this *cis*-regulatory effect, several *trans* eQTLs could also be identified (**Table 2**). The affected genes were *DUT* (Deoxyuridine Triphosphatase also known as UTPase), *EIF1AY* (Eukaryotic Translation Initiation Factor 1A, Y-Linked), *CASP14*

(Caspase 14), *ABCA1* (ATP-binding cassette transporter A1), and *DHRS9* (Dehydrogenase/Reductase 9) (**Table 2**) (55).

The *DUT* gene product is an essential enzyme of nucleotide metabolism, which is required for the hydrolysis of dUTP into dUMP and inorganic pyrophosphate. The enzyme plays an important role in controlling the relative cellular levels of dUTP/ dTTP (70). Lack or inhibition of dUTPase result in elevated levels of uracil in the DNA, which triggers DNA repair and may induce the formation of DNA double strand breaks, somatic mutations, and apoptosis (71).

*CASP14* is involved in cell apoptosis and is over-expressed in skin, the oral epithelium, bone, heart, and epithelial tumors (72). *EIF1AY* encodes a translation initiation factor which seems to be required for maximal rate of protein biosynthesis (73) and *DHRS9* is involved in retinol and steroid metabolism (74). *ABCA1* plays a well-known role in atherosclerosis (75); but its contribution to PD is unclear. It was proposed that LPS from *P. gingivalis*, which is the most important pathogen involved in PD, may suppress *ABCA1* expression during periodontitis via miRNA-mediated mechanisms (76). To further investigate the potential biological implications of the *trans*-effected genes, Zhao et al. (55) performed gene enrichment analysis on basis of the KEGG Pathway databank. The enriched pathways included "retinol metabolism", "TGF-β signaling", and "N-glycan biosynthesis". Retinol metabolism was at the top of the list of enriched pathways, in which *LRAT* (lecithin retinol acyltransferase), *ADH1* (alcohol dehydrogenase 1), *DHRS9*, *DHRS4L2* (dehydrogenase/reductase 9 and 4 like 2), and CYP26B1 (cytochrome P450 retinoid metabolizing protein) were significantly associated. The importance of TGF-β signaling in the pathogenesis of PD is well-known, since anti-TGF-β antibodies can inhibit the recruitment of leukocytes and the destruction of cartilage and bone at the periodontal lesion sites during periodontitis (77). Another reported downstream target regulated by *ANRIL* is *CARD8* (caspase recruitment domain-containing protein 8) (**Table 2**) (61). The *CARD8* SNP rs2043211 is significantly associated with ischemic stroke; but its involvement in PD is unclear. The *CARD8*

gene product is a component of the inflammasome together with other proteins. ANRIL is induced by pro-inflammatory factors, such as TNFα and IFN-γ, via activation of NF-κB (**Figure 1**) (78). The transcription factor Yin yang 1 (YY1) can bind to ANRIL and the ANRIL-YY1 complex interacts with the promoter of *IL6/8* to activate *IL6* and *IL8* expression, two cytokines with well established roles in CAD/MI and PD.

Taken together, these findings seem to suggest that *ANRIL* exerts its effects through epigenetic regulation of a great variety of target genes. The common theme seems to be its involvement in expression regulation of genes that play important roles in inflammation, immunity, cell apoptosis and survival, cell proliferation, and metabolism. Many of the reported *trans* regulated genes clearly have plausible roles in CAD and PD as well. Nevertheless, at this stage, we find it premature to formulate a unifying theory that

FIGURE 1 | Hypothetical roles of linear and circular ANRIL lncRNA in regulating inflammation and cell survival in human vascular endothelial cells and potential drug targets. TNF-α triggers NF-κB activation, which induces *ANRIL* transcription (66). Linear ANRIL can be converted to circular ANRIL (38). Linear ANRIL interacts with the transcription factor yin yang-1 (YY1) to form a functional complex that binds to and regulates expression of target genes such as IL-6/8. Circular ANRIL interacts with pescadillo homologue 1 (PES1) to form a complex with the pre-ribosomal assembly complex, that impairs ribosome biogenesis, leading to activation of p53 and a subsequent increase in apoptosis and decrease in the proliferative rate (41). This pathway may promote atheroprotection by eliminating overproliferating cells in atherosclerotic plaques. Neither TNFα nor NF-κB antagonists do seem suitable for wide-spread use in anti-inflammatory therapies of PD or CAD, because of their serious side effects. Since ANRIL is located downstream of TNFα and NF-κB, ANRIL or its downstream targets may be better suited as drug targets to inhibit the pro-inflammatory activities linked to this signaling pathway [modified according to ref. (78)].

would be consistent with at least the majority of the findings. Most concerning is the apparent complete lack of replication of *trans* regulated genes between the published studies. The reasons for this striking inconsistency may have something to do with the diversity of the experimental approaches and cell-types that have been employed to date. The genome-wide approaches may lack sufficient power to detect some of the differentially expressed genes identified by targeted strategies (55). Antisense approaches are difficult to control due to the complex cell-type specific alternative splicing pathways (38, 41, 42) and findings coming from rodent animal models may not be relevant for humans, since rodent and human ANRIL are evolutionary not well conserved and they differ structurally substantially from each other (79).

## Implications of the Chr.9p21.3 *ANRIL* Locus for Drug Target Identification

Zhou et al. (78) showed that *ANRIL* expression is up-regulated via the TNFα/NF-κB signaling pathway under inflammatory stress conditions (**Figure 1**). Since endothelial cell-specific inhibition of NF-κB protects mice from atherosclerosis (80), and since *ANRIL* is a downstream target of TNFα/NF-κB signaling, targeting TNFα or NF-κB may theoretically be considered to be athero-protective via inhibiting ANRIL-YY1-mediated IL-6/8 production. Several TNFα receptor antagonists (mostly antibodies) have been tested for safety and efficiency for modulating pro-inflammatory cytokine release in the treatment of rheumatoid arthritis (81). However, clinical trials have shown that these receptor antagonists are associated with increased risks of malignancies and serious infections (81). Since *ANRIL* is located downstream of TNFα and NF-κB, it may

### References


be better suited as drug target. However, given the important role of *ANRIL* transcripts in controlling cell growth, its expression is likely precisely regulated. Possibly, putative drug targeting options may come to mind from a better understanding of the precise downstream effects of the linear and circular *ANRIL* lncRNAs on expression of genes involved in chronic inflammatory pathways, suggesting that such work has potential to identify new drug targets for anti-inflammatory intervention.

### Author Contributions

GA and US had the initial idea of writing a review and proposed the topic. Moreover, they conducted extensive literature search and created the first draft of the manuscript. TZ, GH, AS and MM integrated the different information and also wrote and submited the manuscript.

### Funding

No third party funds were used for this work.

### Acknowledgments

GA, TZ, and GH are employed at and receive salaries from the University Medical Center Hamburg-Eppendorf, US is employed at the University Medical Center Hamburg-Eppendorf and receives his salary from a grant provided by the Else Kröner-Fresenius Foundation, AS and MM are employed at and receive a salary from the Charité - Universitätsmedizin Berlin.


9p21. *Arterioscler Thromb Vasc Biol* (2010) 30(3):620–7. doi: 10.1161/ ATVBAHA.109.196832


negatively regulates the growth of myelomonocytic progenitors and the functions of macrophages. *Blood* (2000) 96(5):1723–32.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Aarabi, Zeller, Heydecke, Munz, Schäfer and Seedorf. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Common Methods for Performing Mendelian Randomization

*Alexander Teumer 1,2\**

*1 Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany, 2 Partner Site Greifswald, Deutsches Zentrum für Herz-Kreislaufforschung (DZHK), Greifswald, Germany*

Mendelian randomization (MR) is a framework for assessing causal inference using crosssectional data in combination with genetic information. This paper summarizes statistical methods commonly applied and strait forward to use for conducting MR analyses including those taking advantage of the rich dataset of SNP-trait associations that were revealed in the last decade through large-scale genome-wide association studies. Using these data, powerful MR studies are possible. However, the causal estimate may be biased in case the assumptions of MR are violated. The source and the type of this bias are described while providing a summary of the mathematical formulas that should help estimating the magnitude and direction of the potential bias depending on the specific research setting. Finally, methods for relaxing the assumptions and for conducting sensitivity analyses are discussed. Future researches in the field of MR include the assessment of non-linear causal effects, and automatic detection of invalid instruments.

### *Edited by:*

*Tanja Zeller, Universität Hamburg, Germany*

### *Reviewed by:*

*Bastiaan Geelhoed, University Medical Center Groningen, Netherlands Joylene Elisabeth Siland, University of Groningen, Netherlands*

> *\*Correspondence: Alexander Teumer ateumer@uni-greifswald.de*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 21 February 2018 Accepted: 04 May 2018 Published: 28 May 2018*

### *Citation:*

*Teumer A (2018) Common Methods for Performing Mendelian Randomization. Front. Cardiovasc. Med. 5:51. doi: 10.3389/fcvm.2018.00051*

Keywords: mendelian randomization, causal inference, GWAS, bias, statistical methods

# Introduction

Observational epidemiological studies made important contributions to our understanding of common diseases by identifying important risk factors. Although causal inference is of major interest as it builds a basis for intervention and prevention, it is difficult to perform using observational data from cross-sectional studies. Supposed causality was often revised e.g., by randomized controlled trials (RCTs) (1). Possible reasons for these contradicting findings include unobserved confounding, reverse causation and selection bias in the observational studies (2–4).

On the other hand, RCTs are often subject to long duration and ethical problems. Furthermore, confounding and selection bias is still a problem after the initiation of a RCT. This includes compliance problems or missing of follow-up information depending on treatment effect which may induce missing not at random problems.

During the last decade, huge efforts were undertaken searching for genetic risk factors underlying common traits and diseases. Genome-wide association studies (GWAS) revealed thousands of genetic associations predominantly based on single nucleotide polymorphisms (SNPs) including more than 950 related to cardiovascular diseases and measurements (by April 2018) and were made publically available (5). The effect sizes of these associations were often quite small (6–8), and thus their direct clinical relevance might be questioned. However, these genetic associations may help drawing causal inferences. This approach in which SNPs are used as instrumental variables (IVs) for specific exposures is called Mendelian randomization (MR) (9). By the Mendelian laws, alleles of SNPs segregate and are randomly inherited from parents to offspring. This principle can be seen analogously to the randomized treatment assignment in a RCT resulting in an unconfounded exposure-outcome relationship. Within an MR approach, the exposure represents a continuous or dichotomous risk factor of a disease, and the outcome is the disease or a disease-related trait. These traits may e.g., be blood pressure defining hypertension, or estimated glomerular filtration rate (eGFR) defining the status of chronic kidney disease. Using the MR approach, causality between exposure and outcome can be tested. During recent years, the number of MR studies to assess causality increased substantially which includes also the field of cardiovascular diseases and nephrology (10–14). Furthermore, MR analyses revealed causal effects of blood lipids on coronary heart disease (15) as well as of alcohol consumption on cardiovascular traits (16). However, given the number of potential genetic instruments and statistical methods available nowadays, there is potential for assessing causality of many more traits by conducting successful MR analyses. Nevertheless, some important assumptions have to be fulfilled to be able to estimate an unconfounded and unbiased exposure-outcome relationship thus allowing drawing causal inference. This review describes the assumptions of MR and potential biases caused by violation of these assumptions, and provides an overview of commonly applied statistical methods for conducting MR analyses using individual level data as well as using GWAS meta-analyses results.

### Estimation of the Causal Effect

The general aim of the MR approach is the estimation of a causal effect of an exposure *X* on an outcome *Y* using (one or more) genetic instruments *Z* for *X* (**Figure 1**). Basically, the causal effect will be obtained by two sequential steps. First, the exposure is estimated from its instruments. By using valid instruments, the estimated exposure will be independent of any confounders. In the second step, the outcome is regressed on this estimated exposure thus obtaining an unconfounded and therefore causal effect estimate. The instrument *Z* is usually coded by 0, 1 and 2 per individual according to its number of coding (e.g., exposure increasing) alleles.

### 2-Stage Least Squares Estimator

Given a continuous outcome *Y* and assuming linear effects between *X* and *Y* without interaction, the causal estimate of the exposure *X* on *Y* can be estimated through a 2-stage least squares (2SLS) regression. This method performs both steps described before implicitly. In the first step, the exposure *X*ˆ which is independent of the confounders is estimated via the genotypes of the instruments by calculating the fitted values from the regression of *X* on *Z*. In the second step, the causal effect estimate *βXY*ˆ is obtained by regressing *Y* on *X*ˆ . As both steps are performed in a single model instead of two separate regressions, the variation of both *Z* and *X*ˆ is taken into account which is required for obtaining correct standard errors (SE) of *βXY*ˆ (17). The 2SLS regression can be calculated by standard methods in statistical software packages like R (18) using the function *tsls* of the package SEM, or by the STATA software (https://www.stata.com/) using the command *ivregress*. The 2SLS was included in an MR of testosterone with cardiometabolic risk factors, but the single study analysis limited the statistical power substantially (19).

### Ratio Estimator

Alternatively, the causal effect can be estimated by triangulation without the need of calculating *βXY*ˆ from the exposure-outcome association directly. The principle of this method is illustrated through **Figure 1**: the standard approach (including 2SLS) for

obtaining the causal effect *βXY*ˆ follows the path from the instrument *Z* via *X* to *Y*. In this case, the direct effect *βZY* of the instrument on the outcome *Y* equals the product of effects underlying the path mediated by the exposure, i.e.,*βZY* = *<sup>β</sup>ZX · <sup>β</sup>XY*<sup>ˆ</sup> . By rearranging this equation, the causal effect can be estimated through dividing the effect of the IV on the outcome (*βZY*) by the effect of the IV on the exposure (*βZX*):*βXY*<sup>ˆ</sup> = *<sup>β</sup>ZY*/*βZX*. As the triangulation approach calculates the causal effect (and its SE for testing significant deviation from null) by the ratio of the two IV based effect estimates, it is also known as ratio estimate or Wald estimate. It is important for the computation that both IV based effect estimates refer to the same allele of the IV. Furthermore, the same requirements as for the 2SLS apply. The SE of *βXY*ˆ has to be estimated via the deltamethod which is based on a Taylor series expansion, and can be approximated as (20):

$$\text{var}\left(\beta\_{\text{XY}}\right) = \text{var}\left(\frac{\beta\_{\text{ZY}}}{\beta\_{\text{ZX}}}\right) \cong \frac{\text{var}\left(\beta\_{\text{ZY}}\right)}{\beta\_{\text{ZX}}^2} + \frac{\beta\_{\text{ZY}}^2}{\beta\_{\text{ZX}}^4} \text{var}\left(\beta\_{\text{ZX}}\right) - 2\frac{\beta\_{\text{ZX}}}{\beta\_{\text{ZX}}^3} \text{cov}\left(\beta\_{\text{ZY}}, \beta\_{\text{ZX}}\right)$$

*SE* ( *<sup>β</sup>XY*<sup>ˆ</sup> ) = √ *var* ( *<sup>β</sup>XY*<sup>ˆ</sup> ) , where *cov* ( *<sup>β</sup>ZY*, *<sup>β</sup>ZX*) is the covariance of the two effect estimates. This term will vanish if the effect estimates

are obtained from distinct samples. That concise approximation can be easily implemented for significance testing in statistical software packages like R or STATA.

In contrast to the 2SLS which has to be performed using data of a single sample (one-sample MR), different sample sets can be used for conducting the triangulation: the effect estimates of the IV on exposure *X* and outcome *Y* can be obtained from genetic association studies with either disjunct or overlapping samples (two-sample MR). By this means, genetic associations revealed through large-scale GWAS meta-analyses can be used as *βZX* and *βZY*. These association results are often publically available for a variety of traits.

The triangulation method can also be applied if the outcome *Y* is dichotomous, i.e., an indicator of a disease status. In this case, log-linear effects without interaction on *Y* and an approximately normal distribution of *X* are required. Causal effect estimates on the odds ratio (OR) scale can be calculated by performing a logistic regression analysis using the disease as outcome. This model was also applied in most GWAS. To estimate causal OR using triangulation, the rare disease assumption (i.e., prevalence <10%) has to be fulfilled. Alternatively, estimates of a causal risk ratio may be calculated using a log-linear model instead of a logistic regression (21). The SE of the *βXY*ˆ (i.e., the log causal OR) will be estimated by the same formula as applied in the case of a continuous outcome. An application of the ratio estimator is provided by the MR on cystatin c and cardiovascular disease (22).

### Control Function Estimator

Another method for estimating the causal effect on a dichotomous outcome is provided through the control function estimator (21) which is a two-step approach. In the first step, the exposure *X* is regressed on the instruments *Z*. The residuals of the regression correspond to the non-instrumented part of the exposure and may therefore correlate with a (unobserved) confounder *U* of the exposure-outcome association. In the second step, a logistic regression of the outcome *Y* on *X* is performed, adding the residuals of the first step as a covariate to the model. By adding the residuals of the first step into the model, the effects of *U* on *Y* will be controlled. Thus, the effect of *X* on *Y* of the second regression corresponds to the causal effect estimate. In case a linear regression is conducted in the second step (i.e., for a continuous outcome), the control function estimator is equivalent to the 2SLS estimator (21). This type of MR was conducted for assessing the causal effect of blood lipids on coronary heart disease (15).

### Assumptions of the Instrumental Variables

SNPs have several properties predisposing them for instruments of the exposure. The inherited alleles are not changed by a disease or trait and thus also do not change over time. The random inheritance of the SNP alleles makes the genotype distribution mostly independent from socio-economic and lifestyle factors (1, 23). Nevertheless, specific assumptions still need to be fulfilled to ensure the validity of the genetic variant as an instrument. There are three core assumptions for MR (24–26):


The first condition is required because within the MR the (unconfounded) exposure will be estimated using the allele distribution of the IVs. This assumption can be easily tested, and is considered as fulfilled if the SNP-exposure association has an F-statistic >10 (21, 27).

The second assumption, which is also known as exclusion restriction, is equivalent to the condition that an IV does not have an effect on the outcome when the exposure remains fixed. In general, this assumption is hard to validate as there may be pleiotropic effects of SNPs or SNPs in linkage disequilibrium correlated with genes that have effects on the outcome independently of the exposure. Even without considering the linkage disequilibrium, using SNPs of the pleiotropic gene *GCKR* exemplarily as instruments for kidney function to assess a causal effect on blood pressure would result in an invalid IV as there are effects of *GCKR* on blood pressure likely that are independent of kidney function, e.g., by the known associations of *GCKR* with serum lipid levels. Another violation would occur if the sample consists of a population substructure with different allele distributions, and which is also associated with the outcome. In this case, the substructure would be a common cause of both SNP and outcome opening a pathway from SNP to outcome not mediated by the exposure. Several examples of different scenarios violating the exclusion restriction are provided in the work of Glymour et al. (24).

The third assumption is also hard to validate. Similar problems due to pleiotropy and population substructure as described in the exclusion restriction may occur but affecting confounders of the Teumer MR Methods Review

exposure-outcome relationship instead of the outcome directly. In an example of assessing causality of kidney function with heart disease, using *GCKR* as an instrument would violate the third assumption because these SNPs are also associated with blood pressure being a confounder of the association of kidney function and heart disease.

### Weak Instrument Bias

Until today, more than 50,000 SNP-trait associations were revealed by GWAS and are usually accessible through public repositories like the GWASCatalog (5). These SNPs can be considered as potential instruments for MR analyses. Because the majority of these SNPs explain only a small proportion (i.e. <1%) of the phenotypic variance, GWAS with sample sizes of more than 10,000 or 100,000 individuals were required to unravel these associations at the level of genome-wide significance. However, the small effect sizes of the SNPs on the exposure result in weak instruments when using smaller sample sizes (28). Weak instruments tend to lead estimated causal effects towards the observational association (27). The reason for this bias is originated in using finite sample sizes. Although the IVs are asymptotically independent of confounders, there might be still an association by chance in finite samples. Increasing the sample size or the strength of the instruments will reduce the weak instrument bias. To illustrate the origin and the effect of the bias, let *βUX* and *βUY* be the effects of the confounder *U* on the exposure and the outcome, respectively (**Figure 1**). Furthermore, let ∆*U* be the (by chance) difference in *U* depending on the instrument *Z*. The estimated causal effect *βXY*ˆ can then be computed by the following sum of effects (27):

*<sup>β</sup>XY*<sup>ˆ</sup> <sup>=</sup> *<sup>β</sup>causal* <sup>+</sup> *<sup>β</sup>UY*∆*<sup>U</sup> <sup>β</sup>ZX*+*βUX*∆*<sup>U</sup>* , where as *βcausal* is the true causal effect, and the mean(∆*U* ) =0 because Z is an instrument (assumption 3). This leads the bias term towards zero with increasing sample size resulting in*βXY*<sup>ˆ</sup> = *<sup>β</sup>causal*. The estimated causal effect is also close to the true causal effect in case the effect of the IV on the exposure *βZX* is relatively large compared to the by-chance difference in *U* on the exposure (*βUX*∆*U*). However, if *βZX* is small compared to *βUX*∆*U* (in case of a weak instrument), the estimated causal effect will be biased towards the ratio of the effect of the confounder on the outcome and the effect of the confounder on the exposure, i.e.,*βUY <sup>β</sup>UX* .

### Multiple Instruments Approach

Using multiple valid instruments will help to address the weak instrument bias. Adding multiple uncorrelated (linkage equilibrium) SNPs into a 2SLS model can increase the statistical power but might also increase the relative bias if weak instruments are added (28).

Alternatively, an allele score can be generated from the instruments and included as a single variable in the association model. This allele score *Z* is calculated per individual as the weighted or unweighted sum of the number of risk or trait increasing alleles *Zi* of each SNP *i*, whereas the effect *βZiX* of each SNP on the exposure *X* is used as weight:*Z* = *<sup>β</sup>Z*1*XZ*<sup>1</sup> + *<sup>β</sup>Z*2*XZ*<sup>2</sup> + *···* + *<sup>β</sup>ZkXZk*. In case of an unweighted score where all *βZiX* are set to 1, the allele score of an individual simplifies to the sum of its risk alleles. By using an allele score, the F-statistics increases because of the smaller degrees of freedom in the model. However, it has been shown that the unweighted score has lower power than adding multiple IVs into the 2SLS, but using an appropriately weighted allele score performs similarly. The causal effect is a little less biased when using a weighted allele score but might have a slightly lesser precision (and power) compared to the multiple IV 2SLS estimator. In general, effects obtained from external studies should be used as weights (28).

A third method for taking advantage of multiple IVs is to combine ratio estimates (triangulation) of single instruments using inverse variance weighting (29, 30). The method for combing the results is the same as used for meta-analyses, and is for example implemented in the R package *metafor*. Alternatively, the following simplification of this calculation can be used (31–33):

$$\beta\_{\hat{X}Y} = \frac{\sum \beta\_{\text{ZX}} \beta\_{\text{ZY}\text{var}} \left(\beta\_{\text{ZY}}\right)^{-1}}{\sum \beta\_{\text{ZX}}^2 \nu ar \left(\beta\_{\text{ZY}}\right)^{-1}}$$

with its approximated *SE* ( *<sup>β</sup>XY*<sup>ˆ</sup> ) = √ 1 ∑*β*<sup>2</sup> *ZXvar*( *<sup>β</sup>ZY*)*−*<sup>1</sup> , where the sum runs over the SNP specific estimates. This method is implemented in the R package *gtx*.

However, it is crucial that the effects of all IVs used in the calculation are corresponding to the allele referring to the same effect direction on the exposure (e.g., the trait increasing allele). In theory, problems of missing data may occur especially when using multiple IVs. Nowadays well established methods for imputing missing genotypes based on the linkage disequilibrium structure of the human genome are available to circumvent this problem (34–36).

### Bias by Violation of the Assumptions 2 and 3

Importantly, valid instruments need to be included in the MR analyses. In case the assumptions are not fulfilled, different types of bias can occur leading to invalid causal effect estimates. Violation of the second assumption (the exclusion restriction) implies that there is at least a partial effect of the instrument on the outcome not mediated by the exposure, i.e. *αZY ̸*= 0 (**Figure 1**). Depending on the direction and strength of these pleiotropic effects, the causal effect will be over- or underestimated. As shown within the principle of triangulation, the estimated causal effect *βXY*ˆ is the sum of the true causal effect *βcausal* and a bias term: *βXY*<sup>ˆ</sup> <sup>=</sup> *<sup>β</sup>causal* <sup>+</sup> *<sup>α</sup>ZY <sup>β</sup>ZX* (26). The bias increases due to larger pleiotropy (larger absolute *αZY* in the nominator) or weaker instruments (smaller absolute *βZX* in the denominator). Violation of assumption 3 leads to a bias similar to the weak instrument bias. In this case, the effect of the confounder *U* on exposure and outcome will not vary by chance but systematically because of the non-zero effect of the instrument *Z* on *U*. Thus, an increasing sample size will not remove the bias because mean(∆*U* ) ≠ 0.

### InSIDE Condition and Egger MR

Pleiotropic effects *αZY* of each IV will also be included in the model when applying the multiple instruments approach. However, in this scenario it is possible to substitute the exclusion restriction by a weaker assumption as explained below. If the ratio estimates of multiple instruments are combined via 2SLS or inverse variance weighting, equation (1) will result in

$$
\beta\_{\hat{X}\hat{Y}} = \beta\_{\text{causal}} + \frac{\sum \beta\_{\text{ZX}\alpha\_{\text{ZZ}Y}\text{var}\{\beta\_{\text{ZY}}\}}^{-1}}{\sum \beta\_{\text{ZX}}^2 \text{var}\{\beta\_{\text{ZY}}\}^{-1}}, \text{ where } \beta\_{\text{causal}} \text{ equals the}
$$

$$
\text{right side of (1) and } \frac{\sum \beta\_{\text{ZX}\alpha\_{\text{ZY}}\text{var}\{\beta\_{\text{ZY}}\}}^{-1}}{\sum \beta\_{\text{ZX}}^2 \text{var}\{\beta\_{\text{ZY}}\}^{-1}} \text{ is a bias depending on } \beta\_{\text{ZX}}.
$$

*αZY* and *βZX*. Thus, an unbiased causal effect will be obtained if the assumption 2 is true, i.e., all direct effects *αZY* of each IV on the outcome *Y* are zero. However, it will be sufficient for the bias term to equal zero if all pleiotropic effects *αZY* of all genetic IV cancel out. As shown below, this cancellation is sufficiently fulfilled if the correlation between direct genetic effects *αZY* on the outcome and their effects *βZX* on the exposure *X* (i.e., the strength of the IV) is zero. This independence between the genetic effects *αZY* and *βZX* is called InSIDE condition (Instrument Strength Independent of Direct Effect). If the InSIDE condition holds together with assumptions 1 and 3, an adaption of the Egger regression can be used to obtain a consistent causal estimate even for specific cases in which the exclusion restriction criteria is violated. The Egger regression for MR is an implementation of the meta-regression where the (total) SNP-outcome effect Γ = *βZXβcausal* + *αZY* for each SNP is regressed on the corresponding SNP-exposure effect *βZX*: Γ *∼ β*0*<sup>E</sup>* + *βEβZX* where the slope *βE* is the bias-reduced causal estimate (**Figure 2**). The principle behind this regression is that

FIGURE 2 | Plot of the SNP-outcome ( Γ) on the y-axis vs. the SNPexposure (*βZX*) regression coefficients of potential genetic instruments (i.e., SNPs) of a Mendelian randomization analysis on the x-axis. The true causal effect represented by the slope *βcausal* is shown by a dotted line, the inverse variance weighted (IVW) causal estimate *β*ˆ*XY* by a red line, and the MR Egger regression estimate *βE* by a dark blue line. The total SNP-outcome effect Γ is proportional to *βZX* for valid instruments. In case of invalid instruments but when the InSIDE assumption holds, stronger instruments are on average expected to be closer to the true causal effect (i) than weak instruments (ii). The intercept *β*0*E* represents the overall directional pleiotropy of the instruments. The figure was adapted from the publication of Bowden et al., Int J Epidemiol. 2015;44(2):512–525 (26) (Creative Commons CC BY license).

Γ is proportional to the strength of the instrument *βZX* with the intercept *β*0*E* = 0 for valid instruments, whereas under the InSIDE condition (i.e., *αZY*and *βZX* are uncorrelated) stronger instruments are expected to have a relatively small bias and thus are on average closer to the true causal effect than weak instruments. As the slope of the Egger MR can be calculated by the least squares estimator *<sup>β</sup><sup>E</sup>* <sup>=</sup> *cov*( <sup>Γ</sup>,*βZX*) *var*( *<sup>β</sup>ZX*) <sup>=</sup> *<sup>β</sup>causal* <sup>+</sup> *cov*( *<sup>β</sup>ZX*,*αZY*) *var*( *<sup>β</sup>ZX*) , the bias term will be zero if *αZY* and *βZX* are uncorrelated, which is the case under the InSIDE assumption. A non-zero intercept *β*0*E* indicates an overall directional pleiotropy of the IVs (26).

### Considering Statistical Power

The statistical power of an MR strongly depends on the proportion of variance of the exposure that is explained by the IV. The use of multiple IVs, either by direct inclusion or as an allele score in the model, may therefore increase the power as more variance of the exposure is explained. However, the validity of these instruments has to be ensured (37). Two-sample MR additionally provide a possibility to increase statistical power if published GWAS meta-analyses of both the exposure and the outcome are available. In this case, effect estimates based on large sample sizes of independent studies can be used to estimate the causal effect. Formulas for performing power calculations of MR using single instruments or allele scores are provided in the study of Burgess (37). Brion et al. (38) discusses the statistical power in case of single IV and continuous outcomes for 2SLS MR, and provide an online power calculator for both continuous and binary outcomes which is available at http://cnsgenomics.com/ shiny/mRnd/. A tool for estimating statistical power of complex MR settings based on simulations is MR\_predictor (39), whereas the PERL scripts required to run the estimator are available via GitHub.

### Measurement Unit of the Causal Effect Estimates

When conducting two-sample MR, the causal effect corresponds to the unit of the outcome on a per unit change of the exposure that was used in the respective genetic association study of the IV with the corresponding trait (32). Some GWAS were metaanalyzed using the sample-size weighted z-score method (40) and thus do not provide effect estimates that can be directly included in a two-sample MR. However, it is possible to estimate the effect *β*ˆ for each SNP in Hardy-Weinberg equilibrium using its minor allele frequency *MAF*, its (large) GWAS sample size *N*, and its z-statistics *z* (which can be calculated from the inverse of the standard normal distribution using the association p-value and the corresponding effect direction) through the formula (41): *β*<sup>ˆ</sup> *<sup>≈</sup> <sup>z</sup> ·* <sup>√</sup> *<sup>σ</sup> N·*2*·MAF·* ( <sup>1</sup>*−MAF*) , whereas the corresponding

*SE* ( *β*ˆ ) = *β*ˆ *<sup>z</sup>* . The SD *σ* of the trait can be set to 1 for standardizing the phenotype (i.e., the effect corresponds to a change of one SD of the trait unit). If the outcome is a binary trait, e.g., a disease with prevalence *p* in the sample, then *σ* = √ *p ·* ( 1 *− p* ) .

### Discussion

MR provides a method for testing causality of different traits using cross-sectional data and genetics. Although large sample sizes are required to achieve sufficient statistical power for revealing causal effects, it is often possible to overcome this limitation by using the publically available genetic association results of large GWAS meta-analyses conducted during the last decade.

The statistical methods needed for conducting MR analyses are implemented in common statistical software frameworks. Additionally, the MRbase platform provides a possibility to conduct two-sample MR analyses both online and via the R package *TwoSampleMR*, including the methods discussed in this article (42). A detailed overview of different statistical methods for calculating MR is provided in the review of Burgess et al. (17).

However, it is important that the genetic associations that are used as instruments fulfil the MR assumptions to avoid calculation of biased or spurious causal estimates resulting in false causal inferences. Other than the required strong association of the genetic variant with the exposure, the remaining two assumptions are in general hard to validate.

This review emphasizes the bias that may occur by using invalid instruments, whereas the presented formulas should help estimating the magnitude and direction of this bias depending on the specific MR study that needs to be conducted. Using multiple instruments can help to test the violation of the MR assumptions which may occur due to pleiotropy and via SNPs in linkage disequilibrium (but not for a violation due to population stratification) (28), or to conduct sensitivity analyses (25). A strategy for assessing pleiotropy and population substructure specifically to MR analyses is discussed for example in the work of Lawler et al. (9). The Egger regression can be used as a multiple IV approach to relax the exclusion restriction criteria, and as a sensitivity analysis to test the robustness of the causal association (26). However, if the Egger MR-specific InSIDE assumption is violated, a biased causal estimate and an increased Type I error rate may occur (43). Thus, seeking for genetic variants that are valid IV should be performed as far as possible. Knowledge of the physiology or the biological pathways of the SNPs and their causal genes might be useful for selecting instruments.

### References


The methods summarized in this review assume linear effects between exposure and outcome (or log-linear in case of a binary outcome) without effect modifications by the variables. Addressing these limitations is subject to future research. A method for successfully revealing non-linear causal effects was provided in an example for alcohol intake on cardiovascular traits, but this approach is restricted to additional assumptions and limitations (16). With respect to the presence of effect modifications, other statistical methods for conducting binary outcome MR like structural mean models or generalized method of moments make weaker assumptions but still not solve this issue completely (21). Finally, methods for automatically detecting invalid instruments (i.e., due to pleiotropy) are under development (44). Selection of valid instruments still remains a main challenge for automated causal inference.

### Author Contributions

AT designed and wrote the review.

### Funding

AT was supported by the SHIP study which is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research (grants no. 01ZZ9603, 01ZZ0103, and 01ZZ0403), the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania. We acknowledge support for the Article Processing Charge from the DFG (German Research Foundation, 393148499) and the Open Access Publication Fund of the University of Greifswald.

### Acknowledgments

The author thanks the reviewers for helpful comments on the manuscript.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Teumer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Translating GWAS Findings to Novel Therapeutic Targets for Coronary Artery Disease

### *Le Shu 1,2, Montgomery Blencowe 1 and Xia Yang 1,2,3,4,5\**

*1 Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, United States, 2 Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, United States, 3 Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, United States, 4 Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA, United States, 5 Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, United States*

The success of genome-wide association studies (GWAS) has significantly advanced our understanding of the etiology of coronary artery disease (CAD) and opens new opportunities to reinvigorate the stalling CAD drug development. However, there exists remarkable disconnection between the CAD GWAS findings and commercialized drugs. While this could implicate major untapped translational and therapeutic potentials in CAD GWAS, it also brings forward extensive technical challenges. In this review we summarize the motivation to leverage GWAS for drug discovery, outline the critical bottlenecks in the field, and highlight several promising strategies such as functional genomics and network-based approaches to enhance the translational value of CAD GWAS findings in driving novel therapeutics

### *Edited by:*

*Jeanette Erdmann, Universität zu Lübeck, Germany*

### *Reviewed by:*

*Sander W. van der Laan, University Medical Center Utrecht, Netherlands Baiba Vilne, Technische Universität München, Germany*

> *\*Correspondence: Xia Yang, Ph.D. xyang123@ucla.edu*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 20 March 2018 Accepted: 11 May 2018 Published: 30 May 2018*

### *Citation:*

*Shu L, Blencowe M and Yang X (2018) Translating GWAS Findings to Novel Therapeutic Targets for Coronary Artery Disease. Front. Cardiovasc. Med. 5:56. doi: 10.3389/fcvm.2018.00056*

Keywords: genome-wide association study, coronary artery disease, drug targets, multi-omics, functional genomics, networks

### Introduction

Coronary artery disease (CAD) is a leading cause of mortality worldwide (1). CAD is well recognized as a complex disease with both genetic and environmental contributions (2). The heritability of CAD is estimated to be 40–50% (3), and the genetics of CAD plays an indispensable role in unraveling the pathogenic processes and ultimately facilitating the discovery of novel therapeutics. In the past decade, our understanding of the genetic architecture and mechanistic underpinnings for CAD has been substantially accelerated and broadened, primarily attributable to the successful global collaborative efforts in large-scale human genome-wide association studies (GWAS). These efforts have helped reveal hundreds of novel genetic variants demonstrating significant associations with CAD.

In contrast to the gratifying successes of GWAS, the development of CAD drugs has stagnated over the past decades, especially when compared to other therapeutic areas (4). What is particularly concerning is the fact that the drug development effort has been primarily concentrated on correcting previously established CAD risk factors such as lipid levels, coagulation factors, and hypertension, instead of targeting novel pathways revealed from recent studies (**Figure 1**). This decoupling between mechanistic discovery studies and drug development is striking. Therefore, it is of critical importance to form strategies that leverage the recent genetic discoveries from GWAS and other relevant efforts such as multi-dimensional data integration and systems genetics to allow for efficient identification of novel and reliable CAD drug targets. In this review, we summarize the state of CAD GWAS

discovery, delineate the significant challenges of translating GWAS to drug targets, discuss successful examples of GWAS driven CAD drug target discovery, and outline promising strategies to further catalyze the translation of CAD GWAS into novel therapeutic options.

# GWAS Discovery for CAD and Its Implications for Drug Target Discovery

The completion of the human genome project, the rapid declining cost of genome sequencing, the rising feasibility of global multigroup collaborations, and the increasing accessibility of shared data repositories have collectively fueled the explosion of genetic studies of CAD, particularly GWAS. GWAS were typically designed to profile common known variants, often defined as variants with allele frequency ≥ 0.5% (5), on pre-designed microarrays containing primarily single nucleotide polymorphisms (SNPs). Since the first CAD GWAS in 2007 (6), over 18 GWAS studies have been carried out in the past decade, with the most recent and largest study involving 34,541 cases and 261,984 controls (7). These studies revealed a total of 163 genetic loci linked to CAD (8) (**Figure 1**), explaining 30–40% of CAD heritability (7, 9, 10). The swift pace of GWAS has greatly facilitated the comprehensive construction of the CAD genetic landscape, and has led to rapid accumulation of potential causal variants and genes.

Overall, GWAS have played a key role in not only confirming classic CAD risk factors such as LDL cholesterol, hypertension, and coagulation, but also highlighting the causal roles of cellular proliferation and adhesion, extracellular matrix, and inflammation (**Figure 1**), which are processes related to the endothelial and smooth muscle cells in the vascular wall and the immune system (3, 11). Unfortunately, to date no novel CAD GWAS genes beyond a few involved in classic risk factors have been established as viable drug targets for CAD, a pattern that resonates for GWAS of most complex traits (**Figure 1**) (12). The disconnection between CAD GWAS findings and treatment targets is disappointing and has been criticized, but could also implicate major untapped opportunities (13). In particular, the causal variants and genes involved in the new causal pathways informed by GWAS have been encouraging early stage advances in uncovering novel therapeutic options targeting the vascular wall components, cell proliferation, and inflammation. For example, the *ADAMTS7* loci, coding for a metalloproteinase with thrombospondin motifs 7, was implicated for atherosclerotic progression through smooth muscle cell migration, a mechanism independent of classic CAD risk factors (14). Upon confirmation of its causal role in affecting atherosclerosis occurrence *in vivo* (15), development of the ADAMTS7 pharmacophore has progressed towards establishing inhibitors via virtual screening (16). Tocilizumab, an anti-inflammatory agent blocking interleukin-6, was found to improve endothelial function (17). Antibodies targeting CD47, a key anti-phagocytic and tumorigenic molecule, were also shown to ameliorate atherosclerosis by stimulating efferocytosis (18).

Despite the potential promises, several factors could have complicated the extraction of therapeutic value from GWAS. First, the functional regulatory circuits from most variants to disease outcomes remain elusive. This is reflected by both the difficulty in pinpointing the causal variants and the corresponding target genes, especially for variants located in non-coding regions. In fact, the exact effector genes and functions for over 50% of the CAD GWAS loci are unclear. For example, the 9p21 locus was the strongest CAD locus but is located in a gene desert (6, 19, 20). Multiple follow-up studies have suggested several effectors for this locus, including the non-coding RNA ANRIL (21), *CDKN2A*/*CDKN2B* (22, 23), and interferon-gamma signaling (24). However, the detailed mechanism is still under debate after a decade of research (25). Moreover, even if a CAD variant is located within a gene-rich region, the most adjacent gene(s) may not be the functional candidate (26). Second, even if the candidate genes can be unequivocally determined, the functions of the genes are not necessarily well established, and extensive functional studies are required to derive a mechanistic understanding of how the candidate genes lead to CAD risks. Third, most common variants only confer weak to moderate CAD risk (<20% change in risk), most likely due to evolutionary pressure which selects against non-synonymous SNPs in disease genes involved in key physiological processes (12, 27–30). The prevalence of moderate/weak effect sizes of CAD risk variants makes prioritization of drug targets difficult. Lastly, it has been suspected that the top CAD risk variants identified so far predominantly inform on genes active in the early and slow phase of CAD development, whereas variants affecting late and rapid CAD phases tend to be missed by GWAS as these are likely more dependent on specific contexts such as particular environmental exposures or inflammatory states that are poorly controlled in most GWAS (31). Indeed, a recent study of Crohn's disease that focuses on disease course or prognosis using a within-cases design revealed loci that are completely different from those derived from case-control studies (32). This is also likely the case for CAD. Therefore, drug targets derived from CAD GWAS findings may not carry the expected efficacy to counteract CAD progression.

# Strategies to Fast-Forward the Translation of GWAS to Treatment Targets

To bypass the challenges facing the translation of GWAS findings to therapeutic targets as outlined above, a number of strategies have been designed and attempted. These efforts mainly focus on integrating GWAS hits with other data types that help inform on the functions of candidate genes, pathways, and networks, narrow down and prioritize the causal candidates, and leverage the matching patterns between disease mechanisms and molecular patterns of drugs (**Figure 2**).

### Use of Rare Variant Association Studies to Prioritize Targets

As discussed above, common variants uncovered from GWAS studies are numerous in number while carrying weak to subtle effect sizes, making it challenging to prioritize viable targets. Rare genetic variants (frequencies lower than 0.5%) that are associated with diseases, on the other hand, usually exhibit stronger perturbations in gene functions and are under stronger evolutionary pressure. Therefore, rare variants, especially those leading to loss-of-function, provide a natural setting mimicking human knockout cases to assess phenotypic and clinical consequences of variants, and their power in informing causal disease genes and drug targets has been long recognized (**Figure 2A**) (33, 34). Aggregation of rare mutations in 10 genes, including *APOA5* (35), *APOC3* (36), *ASGR1* (37), *ANGPTL3* (38), *ANGPTL4* (39), *LPA* (40), *LDLR* (35), *LPL* (41), *NPC1L1* (42), and *PCSK9* (43), has been linked to CAD risk through whole exome or whole genome sequencing-based studies. Out of the 10 genes, 5 (*ANGPTL4, APOC3, LPA, NPC1L1, PCSK9)* have been explored as drug targets for CAD (3). To date, the main success of this approach lies in the approval of PCSK9 antibodies by the FDA. Carriers with inactivating mutations on *PCSK9* were found to have markedly lower LDL cholesterol level and CAD risk, which led to the discovery of two FDAapproved monoclonal antibodies, Alirocumab and Evolucumab. However, the potential of drug discovery using CAD rare variants is also limited by both the small number of robust rare variants found so far, and their low cumulative contribution to CAD risk in the general population (9). Additionally, most of them are involved in the previously established pathways rather than novel mechanisms. Nevertheless, these rare variants provide compelling causal inference of the downstream genes and pathways in CAD pathogenesis, and they are more likely to be specific to a disease (broad effects could be detrimental in human knockouts) and have safer profiles, key components for success as drug targets. Future GWAS will likely evolve from SNP array design to whole genome sequencing to profile both common and rare variants (44), thus further expanding the pool of loss-of-function variants for drug target selection. Some of the novel rare variants may inform on novel causal mechanisms not captured by common variants, or converge on genes and pathways already informed by common variants thus serving to enhance the causal inference at a more functional level.

# Functional Genomics to Identify and Prioritize Causal GWAS Genes

In contrast to rare coding mutations whose target genes and downstream mechanisms can be more readily uncovered through traditional functional studies, identifying the causal genes that are responsible for the observed link between GWAS risk variants and CAD is not an easy task. It is estimated that two thirds of the predicted target genes of GWAS locus are not the closest by proximity (45, 46), thus traditional proximity-

patterns between drug molecular profiles and GWAS imputed molecular profiles of disease. (C) Network-based approaches that model CAD GWAS data along with other omics data from CAD relevant tissues or cell types in the context of gene networks, which have the power to pinpoint key network regulators as candidate drug targets with more potent effects.

based locus mapping could introduce false interpretations that bias drug target selection. This challenge can be substantially alleviated by functional genomics tools that explore potential mechanisms linking causal variants to biological phenotypes (**Figure 2A**) (44). Supported by next-generation sequencing, typical functional molecular traits that may be characterized include expression quantitative trait locus (eQTL), non-coding RNA, transcription factor binding sites, epigenetic modification and chromatin interaction (26). The advance of gene editing technologies such as CRISPR/Cas has also significantly improved the efficiency of validation experiments (47). Recent functional genomics studies have substantially refined the candidate causal genes for CAD loci such as *SORT1* (48), *TRIB1* (49), *ADAMTS7* (15, 50), and *TCF21* (51, 52). Noteworthy, there have also been integrative functional genomics studies that combined genomics, epigenomics and transcriptomics profiling to prioritize causal variants and affected genes (7, 46, 53). For example, Miller et al. integrated Assay for Transposase Accessible Chromatin (ATAC-seq) and chromatin immunoprecipitation-sequencing (ChIP-seq) to unravel the cis-regulatory mechanisms in human coronary artery smooth muscle cells, and prioritized 64 variants over 7 candidate CAD loci including 9p21.3, *SMAD3, PDGFD, IL6R, BMP1, CCDC97/TGFB1* and *LMOD1* (53). Haitjema et al. also leveraged circular chromosome conformation capture sequencing (4C-seq) with RNA-seq and eQTL to identify 294 novel candidate CAD genes (54). These studies greatly contribute to the accumulation of viable treatment targets for follow-up drug development efforts.

Encouragingly, functional studies following GWAS are being further catalyzed by large-scale community efforts in establishing multi-cell or multi-tissue mapping of regulatory annotations. The advent of publicly available depositories such as GTEx (55), ENCODE (56) and Epigenome Roadmap (57) is gradually removing the hurdle to acquire multi-dimensional data resources necessary for the investigation of complex traits like CAD.

### Mendelian Randomization (MR) to Facilitate Drug Target Selection

Previous successes in drug development for CAD have testified to the effectiveness of modulating intermediate causal risk factors such as circulating cholesterol levels and blood pressure in lowering CAD risk. Therefore, knowing the causal relationship between an intermediate phenotype that correlates with CAD status is of monumental importance as it can help prioritize biomarkers as intervention targets for CAD therapeutics (58, 59) (**Figure 2A**). The investigation of causal intermediate traits for CAD can be facilitated by MR, which utilizes genetic variants as instrumental variables to assess the causal relationship between exposure (e.g., LDL cholesterol, HDL cholesterol, weight-hip ratio) and outcome (CAD occurrence) (60). We are seeing both successful and ongoing efforts in developing drugs modulating LDL cholesterol, triglyceride-rich lipoproteins and lipoprotein (a) (3), whose causal relationships with CAD have been robustly verified in MR studies (61–63). On the contrary, MR studies revealed inconsistent relationship between HDL cholesterol and CAD (64–66). In concordance with this lack of robust support for the causality of HDL in CAD, substantial obstacles have been met during the development of inhibitors for CETP (cholesteryl ester transfer protein), a gene harboring several loci associated with HDL cholesterol level (67). Three commercial CETP inhibitors, Dalcetrapib, Obicetrapib and Anacetrapib, all failed to achieve clinical efficacy during phase III clinical trials and were discontinued (68).

In addition to the well explored causal pathways such as cholesterol and blood pressure regulation, MR studies have informed several additional causal intermediate phenotypes, such as inflammation (69), uric acid (70), and iron status (71), that could serve as targets for future CAD drug development. By utilizing both summary-level GWAS statistics and UK Biobank data, a recent MR study demonstrated the causal association of waist-to-hip ratio adjusted for body mass index with coronary heart disease, thus providing new opportunities of intervening CAD risk by reducing abdominal obesity (72).

# GWAS-Based "Target-Free" Drug Repositioning

Drug repurposing approaches could leverage known drugs used for other diseases that target the newly uncovered CAD causal genes and pathways to counteract CAD. For example, better understanding of CAD pathways involved in inflammation and cell cycle has promoted the repurposing of drugs targeting diseases such as rheumatoid arthritis (17) and cancer (18). On the other hand, given the challenging nature of identifying both the causal genes from GWAS and matching it with the target of drug compounds, "target-free" approaches have been developed which require no prior knowledge of targets for either drugs or GWAS variants and can simultaneously take many genetic loci into consideration (73) (**Figure 2B**). The fundamental concept behind these approaches is to impute gene expression profiles from GWAS summary statistics, compare the expression patterns against gene expression profiles of drugs, then prioritize top drug candidates whose profiles show reverse patterns compared with GWAS-imputed signatures. This approach is especially useful for repositioning existing drugs whose chemical properties and molecular responses have been well characterized and made accessible from public data repositories such as CMap (74) and its successor, the L1000 platform (75), as well as other chemoinformatic resources (76, 77).

To facilitate such efforts, the work by Gamazon et al. represents one of the first transcriptome imputation pipelines where disease relevant gene expression is estimated from a tissue-dependent model trained with personal genotype data and reference transcriptome (78). Gusev et al. and So et al. further developed summary GWAS statistics based transcriptome imputation methods, which relieved the requirement for individual genotype data (73, 79). In addition, inferring gene expression changes from GWAS enables researchers to assess transcriptome-wide associations with CAD that could yield novel candidate genes for functional and therapeutic investigation (45, 79). Although direct application of the "target-free" approaches for CAD is still under-explored, a computational framework has been developed to reposition existing drugs for psychiatry (73). The framework, built on a GWAS-based transcriptome imputation pipeline named MetaXcan (80), first imputed the gene expression profiles of 10 brain regions for 7 psychiatric disorders based on GWAS and reference transcriptome data from GTEx (55). This disease transcriptome information was then used to match with drug-induced gene expression profiles from the CMap database (74) to prioritize drugs that showed opposite gene expression patterns compared to the disease patterns. These platforms are potentially translatable to CAD.

### Network-Based Drug Discovery Approaches

The success of GWAS-driven drug target identification heavily relies on the fundamental assumption of how genetic risk variants eventually contribute to disease phenotype. An "omnigenic" model for the genetics of complex traits has been recently proposed (30). This provocative model objects the common belief that risk variants drive disease etiology through functional clustering in biological pathways, and emphasizes that all genes in disease-relevant cells could affect core disease processes through the coordination of gene regulatory networks.

Motivated by the gene network hypothesis, the CAD field has been actively investing on the development and application of systems genetics frameworks that integrate genetics and other data dimensions in the context of network topology to help prioritize candidate CAD genes (**Figure 2C**) (27–29, 81–84). The implementation of network-based target identification strategies poses several unique advantages over other methods. First, gene networks have the potential to comprehensively map the regulatory circuits under physiological or pathological conditions, thus improving the biological relevance of predicted targets. Second, gene networks serve as a natural platform for data integration, where GWAS and information from other omics space can be collectively leveraged to pinpoint network hotspots where key perturbation events likely happen. Third, gene networks enable the identification of essential disease genes, which is unlikely to tolerate high frequency loss-of-function variation at the population level and to be discovered by GWAS (85). Several methods have been developed to find network essential genes, or key drivers, by considering both network topology and external disease signatures (27, 81, 86). The validity of the predicted key drivers in driving CAD relevant traits has been well supported (27, 29, 82), and the key drivers have the potential to serve as novel drug targets with strong therapeutic effects due to their central importance in regulating the disease networks. For instance, Zhao et al. recently prioritized CAD key drivers and proposed plausible targets using network approaches (28). Extensive *in vitro* and *in vivo* gene perturbation experiments are required to evaluate the feasibility of using key driver genes as drug targets. If proven valid, network-based discoveries could provide exciting opportunities to formulate more focused and data-informed hypotheses for downstream therapeutic investigation. Nevertheless, it is important to caution that modulation of network key drivers may result in a lack of specificity and increase the risks for side effects due to their broad impact on numerous network genes.

One critical challenge for network-based CAD drug discovery is the availability of high-quality gene networks from CAD relevant cells, tissues, and subjects. Many existing networks are literature-based and lack tissue/cell specificity. Even for datadriven networks, data collection bias exists. For example, human network construction usually requires large numbers of clinical samples that are difficult to acquire, especially for samples from internal tissues. A major breakthrough is the establishment of the STARNET networks involving tissue-specific data from ~600 CAD patients (87). This resource, in combination with other networks generated from mouse models or non-disease human subjects, is invaluable for future CAD network studies. Coordinated efforts by the research community are needed to enhance the coverage of data-driven networks from CAD relevant tissues and cell types.

### Conclusions and Future Directions

GWAS has been highly successful in elucidating the genetic architecture of CAD and driving the discovery of novel biology. While confirming the genes and pathways targeted by classic CAD treatments, GWAS opens doors to a vast number of under-recognized candidates where future CAD drugs could originate. The field of GWAS-driven drug discovery is still at its infancy, and significant challenges remain. However, it is encouraging that numerous methodological advances have been made to address the bottlenecks, and application of these approaches is expected to facilitate future translational research in CAD.

Here we anticipate the following future directions that will help further advance the field. First, there is a need for broader collaboration to conduct large-scale functional genomics studies in human tissues and cell types that implement cutting-edge high-throughput profiling technologies over multiple omics to map the tissue- and cell-type specific regulatory circuit of GWAS loci. In particular, application of cell-type specific analyses at multi-omics levels will help address the functional heterogeneity in CAD relevant tissues, which will lead to refined understanding of disease etiology and lay a solid foundation for more accurate prediction of drugs that can counteract the specific pathogenic processes in the right cell types and tissues (88). The recent launch of the Human Cell Atlas project represents one of the first stepping stones towards this direction (89). Second, more efficient platforms are needed to facilitate sharing of summarylevel GWAS data as well as databases and data repositories that curate multi-omics functional information. For example, in the neurological disorder field, there are emerging efforts like CommonMind (http://commonmind.org), PsychENCODE (90) and BrainSeq (91). Similar coordinated efforts by the CAD community will accelerate identification of CAD drug targets. Third, the translational value of GWAS data can be better exploited by the development of novel analytical pipelines that integrate multi-dimensional data from animals, humans, and chemoinformatic databases. Some of the recently developed analytical pipelines can integrate GWAS and functional genomics data for target prediction, and are directly applicable to CAD (45, 79, 81, 92). Additional pipelines that couple disease datasets with drug footprints in a gene network framework will facilitate the identification of network regulators and pathways that can be accurately targeted. Lastly, gradual transition from initial target screening to GWAS-guided experimental validation of the predicted targets using a combination of *in vitro*, *in vivo*, and *in silico* methods will further the translational path.

### Author Contributions

LS, MB and XY drafted and edited the manuscript.

### References


### Funding

LS is funded by the Burroughs Wellcome Fund Inter-school Training Program in Chronic Diseases Scholarship and UCLA Dissertation Year Fellowship. XY is funded by the Leducq Foundation Transatlantic Network of Excellence, NIH/NIDDK DK104363, and NIH/NINDS NS103088.

### Acknowledgments

The authors thank the reviewers for the constructive comments.


circumspection. *Cardiovasc Drugs Ther* (2016) 30(1):65–74. doi: 10.1007/ s10557-016-6642-9


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Shu, Blencowe and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Genetics of Cardiovascular Disease: Fishing for Causality

Christoph Paone<sup>1</sup> , Federica Diofano<sup>1</sup> , Deung-Dae Park <sup>1</sup> , Wolfgang Rottbauer <sup>2</sup> and Steffen Just <sup>1</sup> \*

<sup>1</sup> Molecular Cardiology, Department of Internal Medicine II, University of Ulm, Ulm, Germany, <sup>2</sup> Department of Internal Medicine II, University of Ulm, Ulm, Germany

Cardiovascular disease (CVD) is still the leading cause of death in all western world countries and genetic predisposition in combination with traditional risk factors frequently mediates their manifestation. Genome-wide association (GWA) studies revealed numerous potentially disease modifying genetic loci often including several SNPs and associated genes. However, pure genetic association does not prove direct or indirect relevance of the modifier region on pathogenesis, nor does it define within the associated region the exact genetic driver of the disease. Therefore, the relevance of the identified genetic disease associations needs to be confirmed either in monogenic traits or in experimental in vivo model system by functional genomic studies. In this review, we focus on the use of functional genomic approaches such as gene knock-down or CRISPR/Cas9-mediated genome editing in the zebrafish model to validate disease-associated genomic loci and to identify novel cardiovascular disease genes. We summarize the benefits of the zebrafish for cardiovascular research and highlight examples demonstrating the successful combination of GWA studies and functional genomics in zebrafish to broaden our knowledge on the genetic and molecular underpinnings of cardiovascular diseases.

Edited by:

Tanja Zeller, Universität Hamburg, Germany

### Reviewed by:

Krishan Kumar Vishnolia, Universität zu Lübeck, Germany Till Joscha Demal, University Heart Center Hamburg GmbH, Germany

\*Correspondence:

Steffen Just steffen.just@uniklinik-ulm.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 21 February 2018 Accepted: 15 May 2018 Published: 01 June 2018

### Citation:

Paone C, Diofano F, Park D-D, Rottbauer W and Just S (2018) Genetics of Cardiovascular Disease: Fishing for Causality. Front. Cardiovasc. Med. 5:60. doi: 10.3389/fcvm.2018.00060 Keywords: genome-wide association study, zebrafish, functional genomics, CRISPR/Cas9, heart disease

# INTRODUCTION

Cardiovascular disease (CVD) is the leading cause of mortality worldwide. CVD describes a class of diseases affecting the heart and blood vessels, such as cardiomyopathies, coronary artery disease, heart failure or arrhythmias. A variety of risk factors, such as smoking, obesity, hypertension or high cholesterol can be causative for CVD, however, it is understood that these traditional risk factors only contribute to a fraction of disease cases (1). Therefore, researchers also focus on the definition of the genetic basis of CVD to identify disease mechanisms independent of environmental risk factors. Recent advances in next-generation sequencing (NGS) techniques enable now an unbiased, whole-genome analysis of patients to identify disease-associated genetic alterations. One of these approaches comprises genome-wide association (GWA) studies (GWAS) that have emerged as a powerful tool to identify disease-related loci and have become a valuable candidate resource for disease causing genes and variants. A GWA study is a hypothesis-free approach utilizing the information of hundreds of thousands of genetic variants across the genome, so-called SNPs (single nucleotide polymorphisms), in large population samples. In this context, GWAS findings are purely genetic, but significant associations between SNPs and the disease are therefore excellent startingpoints for detailed follow-up studies. More than 10,000 of such significant associations with disorders and genomic traits were reported by GWA studies resulting in new insights into biology and molecular mechanisms of various diseases (2). Online platforms like the GWAS catalog provide researchers collected data of published GWA studies and enable the open-access view into these genome-wide analyses (3). The GWAS catalog comprises studies on a variety of diseases ranging from neurological disorders, various cancer types to cardiac diseases, such as cardiomyopathies or arrhythmias. All GWA studies rely on the exact definition of the disease phenotype in patients to obtain an as specific cohort as possible. The influence of a mixed cohort, secondary disease mechanisms or environmental variations might lead to non-significant or underestimated results. This could be particularly observed for GWA studies focusing on heart failure mechanisms (4). Although well-designed, some GWA studies still lack the clinical relevance due to missing causality of the candidate genes. In order to get a fast and reliable validation of GWAS hits, an adequate experimental model in follow-up studies is fundamental. Several model systems are available, ranging from cell culture to animal models and each model has its pros and cons depending on the respective disease mechanism. During the last decades, the zebrafish (Danio rerio) has emerged rapidly as a model organism in cardiovascular research. In this review, we will focus on the use of the zebrafish to investigate the pathomechanisms of heart diseases and discuss its suitability as an experimental tool to validate the disease-association of genes identified by GWA studies.

### THE ZEBRAFISH: SMALL FISH, BIG IMPACT

Zebrafish possess a variety of features that are advantageous for the use as experimental model organism. Due to their small size (2–4 cm), zebrafish are easy to handle and one female can produce around 200 eggs per week. Zebrafish embryos develop externally and very rapidly to freely swimming and fed larvae within 5 days (5, 6). The zebrafish is an excellent system for microscopic applications as embryos are transparent and numerous transgenic fluorescent reporter lines are available or can easily be produced (7). Such reporter lines are widely used to image organ development and morphology as well as physiological parameters like membrane voltage or calcium transients (8, 9). Because of their suitability for imaging applications, zebrafish are also highly interesting for highthroughput small compound screens. This is enabled by already existing and continuously improving screening platforms e.g., for the automated detection of heartbeat, heartrate and fractional shortening in embryos or isolated hearts of adult zebrafish (10– 13). Such set-ups facilitate rapid and high-throughput preclinical tests of large numbers of small molecules and help to identify novel therapeutic strategies (14–17).

Beside the mentioned general advantages, zebrafish exhibit characteristics making them appropriate to study heart development and disease (16, 18–21). Zebrafish heart development proceeds fast and results in a differentiated twochambered heart within 48 hpf (22). In addition, zebrafish embryos, in contrast to mammalian or avian embryos, are able to cover their oxygen demand by diffusion during the first days of development and are not dependent on blood circulation. This enables the investigation of gene knockouts or knockdowns, even if they lead to severe defects of the cardiovascular system (23).

On a genetic basis, humans and zebrafish share a 70% sequence similarity and 84% of human disease-causing genes can also be found in the zebrafish genome (24, 25). However, regarding the cardiovascular system, there are basic morphological differences as the zebrafish heart consists of only two heart chambers, one atrium and one ventricle. This is on the one hand advantageous as it displays a simplified experimental model, on the other hand, these anatomical difference may limit the translation of findings into the mammalian system (21). Unlike mammals, which develop a coronary system during embryogenesis, zebrafish show a vasculature on the heart surface starting at 1–2 month post hatching (26). This restricts the study of coronary artery disease (CAD) to adult zebrafish, although it is possible to analyze basic mechanisms of atherosclerotic lesion development also in the vasculature of zebrafish embryos (27). There are also several parameters of the zebrafish heart that are closer to the human situation than mammalian model organisms, such as the mouse (17). For example, the zebrafish heart rate of 120–180 bpm (beats per min) is comparable to the 60–100 bpm of the human heart, whereas the mouse heart beats 5 times faster. Furthermore, zebrafish ECG parameters are very similar to human values enabling a direct comparison and translation of experimental findings (20, 28).

In addition to the great benefits of the zebrafish in regard to organ development, physiology, handling and imaging, its suitability for genetic manipulation is another big advantage of the system (**Figure 1**). Here, we will give a compressed overview on the repertoire of zebrafish genetic tools and highlight examples, where they have been used to demonstrate the causality of genes or loci identified by GWAS.

### ZEBRAFISH GENETIC SCREENS: A (SWIMMING) POOL OF DISEASE GENES

Before GWAS data became more and more accessible, candidate driven approaches have been very successful in identifying disease-associated mutations. Here, known molecular players and/or regulators of a specific disease-related pathway are screened in a cohort or hereditary trait to find an association with the pathological outcome. Forward genetic screens in zebrafish contributed a lot to these studies as a variety of genes responsible for cardiovascular defects were identified by zebrafish mutant lines arising from mutagenesis screens (18, 29). These screens, comparable to GWAS, have the advantage to be hypothesisfree approaches that identify genetic mutations via a randomly induced phenotype. The most prominent mutagenesis screens are based on alkylating agents like N-ethyl-N-nitrosurea (ENU), which give rise to point mutations leading to nonsense or missense mutations that affect the regulatory and coding region of genes (30–32).

Although mainly recessive and single inherited mutations can be analyzed in such mutagenesis screens, the combination of zebrafish forward genetics followed by human genome analysis led to the identification of several disease-related genes. One example is the zebrafish mutant main squeeze (msq), which harbors a mutation in the gene encoding ILK (Integrin-linked kinase). Msq mutants display progressive loss of ventricular contractility leading to heart failure (33). Another ILK mutant line, lost contact (loc), also displays a cardiomyopathy phenotype (34). After identifying ILK mutations as causative for the loc mutant phenotype, Knöll et al. performed a mutation screen in the ILK gene of human cardiomyopathy patients. This screen revealed an ILK mutation that was associated with the disease and its disease causing effect could be again validated in loc mutants. These examples show that zebrafish can serve as (I) a resource for new candidate genes in heart failure through forward genetic screens as well as (II) a model organism to validate potential disease-causing mutations in reverse genetic analyses.

### REVERSE GENETIC APPROACHES IN ZEBRAFISH

Reverse genetics can be regarded as targeted investigation of a gene of interest by increasing, reducing or silencing its expression. A diversity of reverse genetic tools can be applied in zebrafish, however, several characteristics of zebrafish genetics have to be kept in mind. Zebrafish underwent a whole-genome duplication event with the consequence that for many genes a partially redundant paralog is present (24, 35). In addition, there often exist several transcripts of the same gene and the knockdown or knockout of several genes might be necessary to model the loss-of-function phenotype of a human ortholog. Another aspect that needs to be considered is the genetic variation between and within zebrafish strains that might have an impact on the phenotype and the conclusion drawn from functional analyses (36).

An important and helpful resource for reverse genetic investigations is the zebrafish mutant project that provides a growing list of fish lines with a defined mutation in a specific gene (25). These mutations are induced by chemical mutagenesis, similar to the one used in forward genetic screens, and identified by high-throughput DNA genotyping, an approach called TILLING (targeting induced local lesions in genomes) (37, 38). If a desired and appropriate mutation is available, this open source platform might give scientists a fast access to a loss-of-function model that can be directly used for functional studies. The reverse genetic tools that can be applied in zebrafish are mainly (A) mRNA overexpression and (B) transgenesis, (C) Morpholino-modified antisense oligonucleotide (MO) mediated knockdown or (D) genome editing techniques such as ZFNs (zinc finger nucleases), TALENs (transcription activator-like effector nucleases) or the CRISPR/Cas9 system (clustered regulatory interspaced short palindromic repeats).

### mRNA Overexpression

Injecting synthetic capped mRNA encoding the protein of interest into early embryonic stages is commonly used as a standard method to induce transient overexpression of genes or gene variants (e.g., SNPs/variants identified in GWAS or nextgeneration sequencing) for gain-of-function or loss-of-function studies. Thus, mRNA overexpression in zebrafish was used for example to analyze mutations in the NEXN (Nexilin) gene that were identified in human DCM (dilated cardiomyopathy) patients (39). When overexpressed in zebrafish embryos, these mutant NEXN variants induced a severe DCM phenotype showing the suitability of the method for fast and effective testing of the impact of putative mutations. Even though mRNA overexpression is an effective way to elucidate functions of specific genes, its use is restricted to focus on early organ development and function because of limited stability of the injected mRNA.

### Transgenesis

Transgenesis in zebrafish involves the insertion of foreign DNA into the genome and is often used to create reporter lines, in which a fluorescent reporter gene under the control of a specific promoter is used to label a particular tissue, organ or cell type. The most commonly used system to insert a transgene in the zebrafish germline is the Tol2 system derived from medaka fish. This autonomously active Tol2 element harbors a gene that encodes for a transposase mediating the transposition of the Tol2 element into the genome (40). For transgenesis of zebrafish, the sequence or gene of interest needs to be flanked by 150– 200 bp ends of the Tol2 element. Injection of this construct together with in vitro transcribed transposase mRNA leads to the highly efficient generation of transgenic F<sup>1</sup> offspring (41, 42). For zebrafish heart development and function a variety of transgenic lines are present, such as cmlc2- (myosin light chain 7, myl7) promoter driven reporter lines that specifically label cardiomyocytes of both heart chambers (43, 44). In addition, random insertional transgenesis of EGFP, so called enhancer trap, was shown to result in various reporter lines specifically labeling cardiac structures (45). A powerful combination is the use of transgenic lines in cell transplantation experiments that are widely-used in zebrafish embryos to investigate cellautonomous mechanisms. With this approach, Sawamiphak and colleagues could, for example, analyze fusion events between cardiomyocytes during heart development that enable exchange of mRNA or proteins between individual cells (46).

Furthermore, stable transgenic expression of gene variants associated with heart diseases can serve as an appropriate in vivo model to study the underlying pathology. Huttner and coworkers, for example, showed that transgenic expression of the D1275N mutation of the human cardiac sodium channel (SCN5A), which is associated with cardiac abnormalities in humans, also leads to bradycardia and defects of the cardiac conduction-system in zebrafish (47). Further developments of transgenesis techniques in regard to tissue specificity or inducibility will broaden the possibilities for transgenesis in zebrafish and help to create improved experimental systems for cardiac research (48, 49).

### Morpholino-Mediated Knockdown

Morpholinos (MO) are knockdown reagents that are very stable, resistant to nucleases and can be injected into 1-cell stage zebrafish embryos. Thus, they became a standard approach for gene knockdown in zebrafish (50, 51). In a variety of cardiovascular research studies, MO-mediated knockdowns were performed to analyze the disease association of a particular gene and/or to model specific pathological features. For example, knockdown of genes that are associated with DCM progression in humans also results in cardiomyopathy in the zebrafish (39, 52). However, phenotypes induced by MOs may be more severe than those of the corresponding mutants. This discrepancy can be a result of genetic compensation in the mutant or due to off-target effects of the used MO (53–55). Therefore, proper control of MO specificity, efficiency and toxicity should be performed in all applications (51).

# Genome Editing Techniques

Genome editing has evolved as a major strategy to disrupt the coding sequence of genes of interest leading to a loss-of function. During recent years, various CRISPR/Cas9, ZFN and TALEN approaches were developed and applied in zebrafish research to create gene knockouts. The detailed technical aspects are not the focus of this review, but are reviewed elsewhere (56, 57). ZFNs and TALEN approaches were successfully applied in zebrafish cardiovascular research studies (58, 59). For instance, ZFN-mediated knockout of GATA2 results in severe defects in vascular organization highlighting the importance of this gene for cardiovascular development (60).

The discovery of CRISPR/Cas9 as a genome editing method declared a new era of reverse genetics (61, 62). The CRISPR/Cas9 system is the most efficient genome editing method for reverse genetics in zebrafish and exhibits, due to its simplicity and applicability, many advantages compared to ZFN and TALENs (63). The CRISPR/Cas9 system is a two-component complex composed of the Cas9 endonuclease, which induces DSBs (double-strand break) and a guide RNA (gRNA) recognizing specific DNA sequences (62). CRISPR/Cas9 is remarkably simple and adaptable due to its unique mechanism and therefore, is chosen as a major genome-editing tool among all the technologies present in the zebrafish field (64). Its suitability for reverse genetics in zebrafish, in regard to cardiovascular research, could be shown for example by a knockout of the large transcript pr130 of the Protein Phosphatase 2 Regulatory subunit Bα (PPP2R3A) (65). Here, two pr130 knockout lines demonstrated the importance of pr130 for cardiac development and function and provide a suitable genetic model to study the underlying pathomechanisms. An aspect that needs to be considered in all genome editing approaches is the possible presence of off-target effects. Unbiased whole genome analyses of CRISPR/Cas9 off-target effects are still missing and researchers are most often restricted to the analysis of off-target genes that are predicted by computational approaches (66). By careful design and selection of gRNA sequences and the use of nuclease variants with high specificity the risk for off-target effects can be minimized. Additionally, continuous outcrossing of the mutation and the comparison of at least two independently produced knockout lines help to prevent misinterpretations of a genotypephenotype connection.

Recently, a variety of improvements and new applications of the CRISPR/Cas9 system evolved contributing to the fast implementation of the method in many zebrafish laboratories. The classical targeted knock-out strategy involves the injection of gRNA and Cas9 (mRNA or protein) into 1-cell stage embryos and the screening for germline mutations in subsequent generations (67). Another strategy uses e.g., a catalytically dead Cas9 protein (dCas) lacking endonuclease activity to generate a DNA recognition complex that can specifically perturb transcriptional elongation, RNA polymerase binding, or transcription factor binding (68). CRISPR/Cas9 can also be used to generate defined knock-in fish lines with integrated SNPs, stop codons, HA tags, loxP sites or fluorescent proteins (69, 70). The CRISPR/Cas9 toolbox is continuously growing and recent progress is achieved by using this method for tissue-specific blockage of gene function (71, 72) or by combining the strategy with optogenetic tools to have temporal control over Cas9 activity (73). In the context of cardiovascular research, these improvements will help to obtain heart-specific knockouts and to mimic the late onset and slow progression of many cardiomyopathy subtypes.

## GWA STUDIES AND FUNCTIONAL GENOMICS IN ZEBRAFISH: A POWERFUL COMBINATION

The zebrafish functional genomics toolbox enables a defined analysis of theoretically any gene of interest in vivo. This makes the zebrafish a valuable experimental platform to validate putative disease causing genes that are identified by GWA studies. Indeed, a variety of genome-wide surveys, focusing on heart diseases, already used zebrafish to prove their initial findings. **Table 1** summarizes selected examples of genomewide studies, for which the resulting candidate genes could be confirmed by zebrafish reverse genetics. An early GWA study in 2008 identified three co-segregating genes (HBEGF, IK, and SRA1) associated with DCM (87). For Heparin-binding EGF-like growth factor (HBEGF), the linkage to DCM progression was already known from mouse knockout studies (88). The DCMassociation for the cytokine IK and the steroid receptor RNA activator1 SRA1 is a new connection arising from this study. The disease-relevance of these candidate genes could be verified in zebrafish embryos. The MO-mediated knockdown of all three genes, HBEGF, SRA1 and IK resulted in severe pericardial edema, accompanied by reduced fractional shortening (FS) of the ventricular chamber (87). Another study focused on the genetic basis of CAID (Chronic atrial and intestinal dysrhythmia) and found a linkage to the SGOL1 gene (Shugosin-like 1) (86). The authors could show that SGOL1 is expressed in the sinoatrial region and atrioventricular valves of the adult zebrafish heart. Consistent with its expression pattern, the knockdown of SGOL1 in zebrafish embryos resulted in bradycardia confirming the involvement of SGOL1 in heart rhythm control (86). KCNIP1 (potassium voltage-gated channel interacting protein 1) is another example of a gene that could be linked by whole genome analysis to heart disease, here atrial fibrillation (AF) (82). Interestingly, the reported mutation does not lead to a loss of function, but is suggested to increase KCNIP1 levels. The authors modeled this by the overexpression of KCNIP1 in zebrafish and could show that increased KCNIP1 levels can result in transient atrial tachycardia and AF during high-rate pacing (82). Norton et al. (79) identified BAG3 (Bcl-2 associated anthanogene 3) as a DCM-associated gene and could confirm its disease relevance by knocking-down BAG3 in zebrafish embryos. BAG3-deficient fish showed severe pericardial edemas and a decreased fractional shortening as well as a reduced peak

TABLE 1 | Genome-Wide Studies using the zebrafish model to validate the causality of candidate genes.


Mentioned here are examples found by manual database analysis. The list might not be exhaustive.

AF, atrial fibrillation; CAID, Chronic atrial and intestinal dysrhythmia; DCM, dilated cardiomyopathy; GWAS, genome-wide association study; HF, heart failure; LQTS, long QT syndrome; MO, morpholino; WES, whole exon sequencing.

blood cell flow velocity (79). A second study independently identified also BAG3 as a potential DCM-causing gene (78). In addition, the functional requirement of BAG3 for heart as well as skeletal muscle function was also confirmed by an independent MO-based analysis of several myopathy-related genes (80). Another gene linked to DCM that was identified by a whole genome study is Filamin C (FLNC) (81). The authors of this study also used MO-knockdown experiments to validate their findings. FLNC morphants exhibited dysmorphic or dilated heart chambers as well as impaired heart looping confirming the importance of FLNC for heart function and development (81). Lundby et al. (83) used a GWA approach combined with tissue-specific proteomics to analyze genes associated with LQTS (Long QT Syndrome) (83). They could identify Vinculin (VCL) as a disease-associated gene and could confirm its relevance by using a VCL-knockdown approach in zebrafish. In these experiments, the authors measured cardiac repolarization in isolated embryonic hearts using fluorescent probes and could observe an impaired repolarization response upon loss of VCL (83). Additionally, by using a gene-trap mutant zebrafish line as well as a CRISPR/Cas9 knockout line of VCLb, Cheng et al. could confirm its disease-relevance (85) and a MObased knockdown of VCL in zebrafish in another independent approach also validated the role of Vinculin in heart function and structure (84).

Other studies did not use the zebrafish to validate their candidate genes, however, independent follow-up studies could confirm the disease-causing potential of some of them. For example, two independent GWA studies on DCM and heart failure identified SNPs near the HSPB7 (Heat Shock Protein Family B Member 7) gene to be associated with the disease (74, 77). Three years later, Rosenfeld et al. could confirm the requirement of HSPB7 in heart function and structure by MObased knockdown experiments. HSPB7 depletion led to impaired cardiac morphogenesis due to defects in ventricular size, but also due to an early block of heart tube formation (75). By using a TALEN-mediated knockout of HSPB7, the same group recently showed that loss of HSPB7 increases the susceptibility of adult mutant zebrafish for cardiomyopathy due to impaired protein homeostasis serving as another proof of the initial GWAS findings (76). In addition, this zebrafish study and most of the above mentioned, are not only validating candidate genes from GWA studies, they also allow a more detailed investigation of the underlying pathomechanisms and help to identify novel disease-associated pathways and protein networks.

# CONCLUSION AND FUTURE PERSPECTIVES

The zebrafish has a variety of advantages to be combined with GWAS. Zebrafish are easy to keep, to handle and to image, show many physiological and genetic similarities to humans and are highly suitable for genetic manipulations. These features help to establish valid disease models and allow a plethora of follow-up studies.

Due to obvious differences in morphology and living environment, it should be clear that a simple translation of findings from the zebrafish system to humans is not always possible. In addition, the probably biggest difference and, also peculiarity of the zebrafish heart compared to mammals is the ability to regenerate injury (89). This may result in drawbacks and limitations when comparing pathomechanisms in fish and humans, but also makes the zebrafish a highly interesting model to study the underlying mechanisms of regeneration (90, 91). It is important to mention that a disease-association of a particular gene that cannot be confirmed in zebrafish doesn't necessarily mean that it is not causative for the phenotype. Mechanisms like intrinsic repair processes or genetic compensation may hide a causative effect of a gene mutation. In such situations, other experimental models, like rodents or patient-derived iPSC (induced pluripotent stem cells) might lead to clearer results (92, 93). Nevertheless, all mentioned benefits make the zebrafish a valid and highly suitable model to investigate cardiovascular pathologies and to prove findings from GWAS. Many SNPs identified in GWA studies are located in non-coding regions of the genome and might affect for example enhancer or repressor binding, microRNA binding sites or chromatin structure. Also for these kinds of mutations, the zebrafish system can help to identify their in vivo relevance, biological role and the underlying pathomechanisms. Madelaine et al. (94), for example, could very recently confirm human diseaseassociated SNPs in CNEs (conserved non-exonic elements) by

### REFERENCES


using CRISPR/Cas9-mediated deletion of the respective noncoding locus (94).

We are sure that the fruitful synergism between GWAS and zebrafish in cardiac research will expand in the future and will lead to the identification of novel disease-causing genes and variants and help to screen for possible therapeutic strategies.

### AUTHOR CONTRIBUTIONS

CP, FD, D-DP, WR, and SJ contributed substantially to the conception, drafting, and revision of the manuscript and approved the final version.

### FUNDING

We thank for financial support from the Deutsche Forschungsgemeinschaft (DFG) RO2173/5-1 (WR) and JU2859/2-1 (SJ), the German Federal Ministry of Education and Research (BMBF) (e:Med-SYMBOL-HF grant #01ZX1407A) (SJ) and the Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg (MWK) Juniorprofessurenprogramm (SJ).


large-scale assessment of single-guide RNAs. PLoS ONE (2014) **9**:e98186. doi: 10.1371/journal.pone.0098186


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Paone, Diofano, Park, Rottbauer and Just. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Using Gene Expression to Annotate Cardiovascular GWAS Loci

### Matthias Heinig1,2 \*

1 Institute of Computational Biology, Helmholtz Zentrum München German Research Center for Environmental Health, Neuherberg, Germany, <sup>2</sup> Department of Informatics, Technical University of Munich, Munich, Germany

Genetic variants at hundreds of loci associated with cardiovascular phenotypes have been identified by genome wide association studies. Most of these variants are located in intronic or intergenic regions rendering the functional and mechanistic follow up difficult. These non-protein-coding regions harbor regulatory sequences. Thus the study of genetic variants associated with transcription—so called expression quantitative trait loci—has emerged as a promising approach to identify regulatory sequence variants. The genes and pathways they control constitute candidate causal drivers at cardiovascular risk loci. This review provides an overview of the expression quantitative trait loci resources available for cardiovascular genetics research and the most commonly used approaches for candidate gene identification.

### Edited by:

Tanja Zeller, Universität Hamburg, Germany

### Reviewed by:

Hauke Busch, Universität zu Lübeck, Germany Ville-Petteri Makinen, South Australian Health and Medical Research Institute, Australia Saikat Banerjee, Max-Planck-Institut für Biophysikalische Chemie, Germany

\*Correspondence:

Matthias Heinig matthias.heinig@ helmholtz-muenchen.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 26 March 2018 Accepted: 15 May 2018 Published: 05 June 2018

### Citation:

Heinig M (2018) Using Gene Expression to Annotate Cardiovascular GWAS Loci. Front. Cardiovasc. Med. 5:59. doi: 10.3389/fcvm.2018.00059 Keywords: eQTL, expression quantitative trait loci, genome wide association study, GWAS, cardiovascular disease

# BACKGROUND

The ultimate goal of any genetic association analysis is to identify genetic variation linked to variation of a phenotype and to elucidate the molecular mechanisms, which are altered by the sequence variation. Genome wide association studies have been tremendously successful in identifying thousands of disease-associated loci as documented by the steady growth of the continuously updated GWAS catalog (1). This progress has also highlighted hundreds of loci associated with cardiovascular phenotypes: the current GWAS catalog (2) lists 249 distinct chromosomal regions associated with coronary artery disease with candidate genes and pathways at many loci summarized in Klarin et al. (3), 138/115 with diastolic/systolic blood pressure, 109 with QT interval, to name just the top three cardiovascular phenotypes. Follow up analysis of these loci aim to establish the causal mechanisms underlying the statistical associations. In classical family based linkage studies typically identifying rare variants with very large effect sizes, the causal variants are typically located in the protein sequence and have a strong impact on protein function (4), for instance truncating mutations in the sarcomeric protein TTN cause dilated cardiomyopathy (5–8). In GWAS however, the identification of causal variants proved to be very challenging, since the vast majority of these disease-associated variants is located either in introns of genes or in intergenic regions (2). Therefore the classical approach of identifying the variant with strongest impact on protein function, such as gained stop codons is not sufficient.

Recent large-scale efforts have annotated a plethora of functional regulatory elements such as enhancers residing in the non-protein-coding part of the genome (9, 10). Therefore an alternative mechanism might be that disease-associated regulatory variants alter the sequence and function of such regulatory elements. Indeed a systematic analysis of the location of disease-associated variants showed that they preferentially reside in regulatory elements (11, 12). Since regulatory elements are highly tissue specific, this information can even be used to identify the disease-relevant

Heinig Annotating Cardiovascular GWAS With eQTLs

tissues (11, 12). These results from localization analysis are highly suggestive that disease-associated variants alter regulatory elements. It now remains to be shown that they indeed are altered and to identify the respective target gene whose transcription is controlled by the regulatory element.

Integrated analysis of the genetics of gene expression provides an elegant way of directly assessing the consequences of putative regulatory sequence variants on transcription. In this study design (13), a population cohort is characterized for their genome wide patterns of genetic variation and also for genome wide gene expression. Gene expression levels are treated as quantitative traits and systematically tested for associations between sequence variants and gene expression. Significant associations are called expression quantitative trait loci (eQTL). These eQTL not only identify putative regulatory variants, but also their target genes as the gene whose expression is associated with the variant (14, 15). Biological information processing and regulation is not limited to transcription, so this approach has also been generalized toward other intermediate molecular traits such as DNA methylation (16, 17), open chromatin (18), histone modifications (19–21), gene, exon and transcript expression levels (22–26) translation and protein levels (27) as well as metabolites (28, 29). In particular the information from the epigenome can be used to identify regulatory variants, and to characterize their role in disease (11, 18, 21, 27).

# eQTL RESOURCES FOR CARDIOVASCULAR GENETICS

Regulatory elements and also the effects of variants on those elements can be highly tissue specific, therefore it is key to investigate the tissue relevant for the disease (11, 12, 25, 30). Because biopsies of tissues relevant for cardiovascular diseases, in particular of the heart are very difficult to obtain from humans, it is not surprising, that early applications of eQTL analysis to identify candidate genes for cardiovascular phenotypes were reported in animal models (31). To understand the regulatory impact of sequence variants in humans, samples of disease relevant tissues are often obtained during surgery, from organ donors or from post-mortem sections. As a consequence of these practical considerations, the transcriptome data might be confounded by differences in tissue composition (32) or ischemic time of post-mortem samples (25). Therefore additional care has to be taken in data analysis accounting for observed and hidden confounders (33). Current reviews provide an overview of recent human eQTL studies (15, 34). The most comprehensive study to date is the Genotype tissue expression (GTEx) project, which aims to characterize regulatory sequence variants across 44 distinct tissues from post-mortem sections (26). This includes cardiac tissues: left ventricle, atrial appendage; vascular tissues: aorta, tibial artery, coronary artery; as well as metabolic tissues: liver, subcutaneous and viscelar adipose tissue (**Table 1**). In terms of sample size and coverage of tissues of interest, the eQTL data generated in the STARNET consortium is currently the most comprehensive TABLE 1 | Recent cardiovascular eQTL resources.


resource (38). It focuses on vascular and metabolic tissues in patients with coronary artery disease. It has been shown that eQTL are sometimes dependent on the disease context (32). This observation is also supported by the finding that more eQTLs associated with disease SNP can be found in diseased populations (38). Formation of atherosclerotic plaques is an inflammatory process, therefore also immune cells such as monocytes or macrophages are considered disease relevant tissues and have been extensively profiled (39). Since the disease relevant tissues are not always known a priori efforts are currently underway to establish cohorts of induced pluripotent stem cell that can potentially be differentiated into any cell type for genetic mapping (40). These eQTL projects are complemented by large scale projects aimed at creating a reference map of regulatory elements across an exhaustive set of 111 human cell types and tissues (10) by annotation with epigenetic markers of regulatory elements and recent developments of sequencing based methods (e.g., Hi-C) to study chromosomal architecture (41) in a wide variety of human tissues (42) including heart, liver and aorta. These techniques can identify promoter—enhancer interactions and have already been used successfully to identify IRX3 as the causal gene underlying an obesity GWAS hit located in the intron of the FTO gene (43).

# CANDIDATE IDENTIFICATION STRATEGIES

### cis eQTL Candidate Genes

Overlapping eQTL and GWAS SNPs is the most straightforward approach to identify candidate genes for GWAS hits. If a GWAS SNP is also an eQTL for a close by gene or in tight LD with an eQTL, it is conceivable that the SNP indeed affects a regulatory element controlling the expression the respective gene. These genes are typically called cis-eQTL when the distance between gene and variant is not further than 500 kb−1 Mb, as opposed to trans-eQTL, where the distances are greater or the variant and gene are located on different chromosomes. Cardiovascular candidate genes such as SORT1 (44) and LIPA (45) have been identified as cis-eQTL. It has been demonstrated that these candidate genes frequently are not the genes located closest to the GWAS SNP for heart related traits (32) and also more generally for any GWAS trait (25, 26). Nowadays, this candidate annotation approach is becoming a standard analysis included in many GWAS papers and can be performed conveniently using the online software FUMA (46). For instance a recent GWAS on CAD (47) identified eQTL for 196 genes at 97 of the 161 CAD loci found in the analysis from GTEx and other eQTL data bases. This result already demonstrates one caveat of the approach: several candidate genes might emerge for a locus and might be inconsistent between tissues or GWAS variants might also associate with eQTL by chance (26). In this particular example 36 loci have unique candidate genes and additional 24 loci have candidate genes detected consistently across tissues, so 60 loci can be annotated confidently. Overall a highly significant enrichment of trait associated SNPs can be observed among eQTLs as demonstrated for heart related traits (32). Less frequently also trans-eQTL are considered for the annotation of GWAS SNPs, as they do not readily provide a clear mechanistic explanation. Nevertheless, it has been shown in a systematic analysis of GWAS variants, that they frequently also associate with expression levels of genes distant to the GWAS locus (48).

An important limitation of the overlap-based strategy is that it cannot be used to establish causality. Strictly speaking the experimental design does only allow inferring causality in a statistical sense. In genetic associations the direction of causality is always fixed (**Figure 1A**). To establish a causal chain between genetic variation, gene expression and the disease phenotype in the strict sense, an interventional experiment would be required, where all other confounding factors that could determine the phenotype are fixed and only the gene expression level would be manipulated to test an effect on the phenotype. If gene expression is indeed causal for the phenotype, any change of the gene expression necessarily would cause a change in the phenotype. In the concept of Mendelian randomization (MR) one is considering a genetic variant as instrumental variable controlling the levels of gene expression and observes its effect on the phenotypic outcome (49). In analogy to randomized control trials, individuals get assigned to a group based on their genotype. Because the direction of causality between genetic variant and gene expression is fixed and the genetic variant is robustly associated with expression levels, one group will receive a higher dose of gene expression. Assuming that the genotype is independent of confounding factors (**Figure 1A**) changes in phenotypic outcome can be attributed to the changes in gene expression.

Classically, MR and similar approaches to statistically establish causality (50, 51) require to measure all variables in the same population (**Figure 1B**). This is often not feasible, as gene expression profiling in each and every disease cohort is prohibitively expensive. In practice GWAS SNPs and eQTLs are identified in separate populations. Because of data privacy regulations, often a researcher only has access to the full individual level data of one population and the summary statistics of the other population. Depending on which full data set is available there exist several methods allowing to directly integrate the measured data with summary statistics (52– 55). A Bayesian co-localization approach based on summary statistics (56) is testing whether the co-localization of two association signals is compatible with a common underlying causal variant and has been successfully applied to blood lipid traits and liver eQTL. An alternative approach is to impute gene expression levels (57) into a GWAS population (54, 58) using eQTL summary statistics from an eQTL reference population. Subsequently the imputed gene expression can be correlated to the disease phenotype to identify candidate genes (54, 58). Alternatively the transcriptome wide association study (TWAS) method (54) and other methods (Barbeira et al. in review) can also work completely without individual level data by indirectly associating expression and phenotype using eQTL and GWAS summary statistics and the LD structure between SNPs. The TWAS approach showed superior power compared to colocalization analysis and simple overlap based analysis in cases where the causal variants are not directly observed, or when multiple causal variants affecting expression and phenotype exist. Consistent with other candidate identification strategies, analysis of obesity related traits with TWAS showed that 66% of identified trait associated genes were not the closest gene (54). Summary data-based Mendelian Randomization (SMR) is a method that can be used if only summary statistics are available from both eQTL and GWAS results. The method makes use of standard twosample MR (59) to identify causal or pleiotropic effects of sequence variants on gene expression and phenotypes and distinguishes this situation from overlapping independent causal variants in LD using a test on multiple SNPs (55). Similar to results from TWAS analyses, the application of this method to five common diseases showed that only 60% of the identified candidate genes are the closest gene to the GWAS SNP.

### Network Based Analysis

Genes are not acting in isolation, but rather form functionally related pathways and networks. Pathways are usually defined based on curated prior knowledge about well-studied processes such as biochemical reactions and signaling pathways (KEGG, Reactome, GO). Pathways can be represented as sets of genes of the same process or as networks preserving the topological

information which genes are connected to one another, for instance by catalyzing adjacent steps in a metabolic pathway. Alternatively, networks can be derived from high-throughput experiments such as transcriptome profiling (co-expression network) or protein-protein interaction (PPI) screening (PPI network). Pathways and networks defined either from prior knowledge or from data can subsequently be used for the interpretation of disease associations derived from GWAS. Representing pathways as sets of genes, one can ask, whether a set of genes shows higher evidence of association to disease than random gene sets of the same size. Because GWAS test individual SNPs and not genes, a mapping between SNPs and genes is required, for instance based on genomic positions. Methods such as SNP set enrichment analysis (60, 61) can then be used to test the statistical significance of the association between gene sets and the GWAS results by comparing the distribution of GWAS P-values of SNPs within the pathway to a background distribution. These methods have been applied to show the association between CAD and pathways for lipid metabolism, coagulation, immunity (62).

Since eQTL experiments require transcriptome profiling in large cohorts, it is natural to use this data to define data driven gene co-expression networks and gene sets, so called co-expression modules. These gene sets are then annotated according to their gene function or cell type specificity and then related to disease via GWAS results using SNP set enrichment analysis. The link between genes and SNPs can naturally be established via cis-eQTLs of the genes of a coexpression module. This approach was also used in the CAD study mentioned above (62). It is important to note that co-expression modules are not necessarily fully overlapping with biochemical pathways although they might represent the same disease process. For instance the modules might contain transcriptional regulators and parts of a biochemical process that they control.

Network topology of co-expression networks is often used to prioritize candidate genes based on the assumption, that genes with many network connections (so called hubs) are more important (38, 62–65). A study investigating shared molecular networks and their drivers between cardiovascular diseases and type 2 Diabetes applied this strategy (64). Knockout mice for selected key driver genes show indeed metabolic phenotypes and gene expression changes in the network neighborhood of the key drivers. Similarly several studies on CAD identified key driver genes and provided evidence for their functional implication in mouse (65) and in vitro studies (62, 65).

# CONCLUSIONS

eQTL data provides first leads toward uncovering the mechanisms underlying the statistical associations observed between genetic loci and common cardiovascular diseases. Major challenges for a broad applicability of this approach need to be overcome. First, regulatory elements and therefore also the regulatory impact of sequence variation is highly cell type specific. The GTEx project is addressing this challenge by providing a large scale cross tissue eQTL data base. However, not all conceivable tissues and cell types can be systematically analyzed. In particular transient developmental stages might leave a lasting phenotypic footprint. Induced pluripotent stem cells from cohorts offer an elegant solution (40) as they can potentially be differentiated into any cell type or developmental stage (Nguyen et al. in review) and studied for eQTLs. A second challenge is posed by variability of the genetic effects on expression between different cells making up a tissue and even between cells of the same cell type. eQTL mapping based on single cell transcriptomic data is becoming feasible (66) and can be used to quantify and map the genetic determinants of cell to cell variability of gene expression. Lastly the grand challenge is to move from correlation or co-localization toward causation. Clearly this is the most difficult task and requires on top of rigorous statistical approaches such as MR also experimental validation.

### REFERENCES


### AUTHOR CONTRIBUTIONS

The author confirms being the sole contributor of this work and approved it for publication.

### FUNDING

This work was supported by funding to MH by the Federal Ministry of Education and Research (BMBF, Germany) in the projects eMed:symAtrial (01ZX1408D) and eMed:confirm (01ZX1708G).


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Heinig. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Scientific Contributions of Population-Based Studies to Cardiovascular Epidemiology in the GWAS Era

*Wolfgang Lieb 1\* and Ramachandran S. Vasan 2,3*

*1 Institute of Epidemiology, Kiel University, Kiel, Germany, 2 Framingham Heart Study (FHS), Framingham, MA, United States, 3 Section of Preventive Medicine and Epidemiology, Boston University School of Medicine, Boston, MA, United States*

Longitudinal, well phenotyped, population-based cohort studies offer unique research opportunities in the context of genome-wide association studies (GWAS), including GWAS for new-onset (incident) cardiovascular disease (CVD) events, the assessment of gene x lifestyle interactions, and evaluating the incremental predictive utility of genetic information in apparently healthy individuals. Furthermore, comprehensively phenotyped communitydwelling samples have contributed to GWAS of numerous traits that reflect normal organ function (e.g., cardiac structure and systolic and diastolic function) and for many traits along the CVD continuum (e.g., risk factors, circulating biomarkers, and subclinical disease traits). These GWAS have heretofore identified many genetic loci implicated in normal organ function and different stages of the CVD continuum. Finally, population-based cohort studies have made important contributions to Mendelian Randomization analyses, a statistical approach that uses genetic information to assess observed associations between cardiovascular traits and clinical CVD outcomes for potential causality.

Keywords: GWAS (genome-wide association study), population, genetic variation, genetic predisposition to disease, risk prediction

# What Are Key Features of Population-Based Cohort Studies?

As a brief introduction, we would like to highlight important design features of population-based studies. As opposed to hospital-based referral samples, population-based epidemiological studies examine community-dwelling or random samples from the general population. As such, study participants are not selected based on a given disease, but rather to represent the general population of the areas sampled, so that observations from such a sample are generalizable to the underlying source population. It has to be kept in mind, though, that the response rate of some landmark cohort studies is rather low [e.g., 5,5% for the UK Biobank (1)], which increases the potential for selection bias (2). Furthermore, most population-based studies are longitudinal studies that are re-examining their participants every few years so that repeated measures of several traits are available and trajectories over time (and their genetic underpinning) can be assessed, as opposed to analyses of single occasion measurements of select traits in typical referral samples. Thus, population-based cohort studies include many individuals free of the disease of interest at the beginning of the study, but who might develop the condition of interest over the course of the study. Therefore, population-based cohort studies are ideal

### *Edited by:*

*Jeanette Erdmann, Universität zu Lübeck, Germany*

### *Reviewed by:*

*Niek Verweij, University Medical Center, Netherlands Ilja Demuth, Charité Universitätsmedizin Berlin, Germany*

*\*Correspondence:*

*Wolfgang Lieb wolfgang.lieb@epi.uni-kiel.de*

### *Specialty section:*

*This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine*

> *Received: 21 February 2018 Accepted: 11 May 2018 Published: 07 June 2018*

### *Citation:*

*Lieb W and Vasan RS (2018) Scientific Contributions of Population-Based Studies to Cardiovascular Epidemiology in the GWAS Era. Front. Cardiovasc. Med. 5:57. doi: 10.3389/fcvm.2018.00057*

to study risk factors and intermediate traits for the development of chronic disease conditions and to estimate measures of disease incidence (3, 4).

Third, many population-based cohort studies perform deep physiological/clinical and molecular phenotyping of their study participants (5). For example, comprehensive physiological, biochemical, subclinical, and clinical measurements are obtained on the participants using highly standardized methods. Similarly, clinical endpoints are adjudicated in a comprehensive and highly standardized process, which enhances the accuracy and validity of endpoint data from population-based cohort studies. The molecular characterization may include the assessment of common and rare genetic variation and other OMICs measurements, such as epigenomics, transcriptomics, lipidomics, proteomics, and metabolomics (5). These key features of population-based studies allow specific research questions to be addressed in the context of genome-wide association studies (GWAS). For example, the detailed phenotyping allows comprehensive adjustments and mediation analyses in order to delineate whether an observed association between a genetic variant and cardiovascular outcomes is independent of traditional risk factors and whether traditional risk factors or biomarkers might mediate the observed association. Overall, populationbased studies have made a substantial contribution to scientific discoveries in the GWAS era. A few illustrative highlights of such findings from cohort studies are described below.

# Reference Sample for Genetic-Epidemiological Analyses

Since many community-dwelling samples are representative of the general population, population-based studies have served as reference ("control") samples for many genetic case-control analyses. In essence, genetic case-control studies compare allelic frequencies of genetic variants in prevalent cases (patients who have the disease of interest when they are sampled) and controls. Ideally, the control sample captures the distribution of the exposure (in this case, the allele frequencies of putative genetic variants) in the source population from which the cases were derived (6). Therefore, population-based studies have provided controls for genetic case-control studies of a broad spectrum of traits, including myocardial infarction (MI)/coronary artery disease (CAD) (7), stroke (8, 9), and dilated cardiomyopathy (10). Importantly, as detailed below, GWAS might reveal different results depending on whether prevalent or incident cases are being analyzed.

# GWAS Analyses for a Broad Spectrum of Phenotypic Traits and Biomarkers Along the Cardiovascular Disease Continuum

The broad and highly standardized phenotyping of their study participants has allowed many different contributions of population-based cohort studies to GWAS. Specifically, researchers from population-based studies have performed and contributed to numerous GWAS for traits along the cardiovascular disease continuum, including traditional CVD risk factors [e.g., lipids (11), blood pressure (12, 13), and glycemic traits (14)], circulating cardiovascular biomarkers [e.g., B-type natriuretic peptide (BNP) (15), C-reactive protein (16), troponin (17), aldosterone, renin concentration, renin activity (18), adipokines (19), and fibrinogen levels (20)], and subclinical cardiovascular disease traits [such as indices of left ventricular structure and function (21, 22), carotid intima media thickness [IMT] (23), and coronary artery calcification (24)]. Of note, cardiac function can be assessed by different modalities, including e.g., ECG, echocardiography, MRI/CT and circulating biomarkers; and genome-wide genetic analyses have been conducted for various of these traits, including ECG parameters (25), echocardiographic traits (21, 22) and MRI measures of cardiac structure and function (26), as well as relevant biomarkers (15).

It is important to keep in mind that community-based samples (as opposed to clinical samples with established disease) include many individuals free of CVD at the time of inclusion in the study so that population-based cohort studies offer great opportunities to study the development of cardiovascular disease conditions over the adult life course (27), including very early (clinically asymptomatic) stages of the disease process and the genetic underpinning of these early stages. Thus, the above-mentioned GWAS have described to what extent different stages along the CVD continuum are associated with genetic variation and which genes might be involved.

Furthermore, given the large proportion of apparently healthy individuals in population-based cohort studies (as opposed to clinical samples), these studies conducted GWAS of many traits that reflect relatively normal organ function, including biomarkers of cardiac structure and systolic and diastolic function (21, 22). These studies provided important insights how physiological organ function is influenced by genetic variation, and how organ dysfunction might contribute to different disease processes (21, 22).

# Assessment of Gene X Lifestyle Interactions

It is an important and growing area of research to quantify the contribution of genes and of different lifestyle factors (and their interactions) to inter-individual variation in cardiovascular risk factor levels and disease risk. Since well phenotyped cohort studies usually have comprehensive genetic data and detailed lifestyle information available, population-based studies represent an ideal setting to study gene x lifestyle interactions. The interaction of a genetic risk score (based on 50 SNPs) and a lifestyle score (including information on smoking, obesity, physical activity, and diet) on the incidence of CAD has been analyzed in several large community-based cohorts (28). Key observations from these analyses were that (i) both scores, the genetic risk score and the lifestyle score, were independently associated with the risk of incident CVD and that (ii) a favorable lifestyle was associated with an almost 50% reduction in the relative risk for CAD, as compared to those with an unfavorable lifestyle profile (28). This reduction in the relative risk of CAD by a favorable lifestyle was observed in individuals with high genetic risk, but also in individuals with low and intermediate genetic risk (28). Very similar observations were made in more than 270.000 participants of the UK Biobank, when a polygenic risk score, representing 314 BP-associated loci, as well as a slightly different lifestyle score (including information on body mass index, healthy diet, sedentary lifestyle, alcohol consumption, smoking, and urinary sodium excretion levels) were related to different BP traits and to incident CVD (29). Both, the genetic risk score as well as the lifestyle score were associated with BP traits and incident CVD. Importantly, a favorable lifestyle as compared to an unfavorable lifestyle was associated with substantially lower average BP values in all categories of genetic risk (low, intermediate, high) and with an about 30% lower relative risk for incident CVD (29).

The same lifestyle score as in Reference (28) was used in a sample of young women (aged 25 to 40 years) from the Dutch Lifelines cohort to assess the contribution of rare and common genetic variation and of lifestyle factors to very low (≤1st age- and sex-specific percentile) and very high (≥99% age- and sex-specific percentile) levels of LDL-C. The study revealed that about two thirds of the women with very low LDL-C levels had a likely genetic cause (either a relevant mutation in an established gene for monogenic hypocholesterolemia or a very low polygenic risk score), whereas the lifestyle score (28) was not statistically significantly associated with low LDL-C concentrations (30). In cases with hypercholesterolemia, however, an unfavorable lifestyle seems to be more relevant. Only about 40% of the women had a genetic cause (relevant mutations in genes for monogenic familial hypercholesterolemia) or predisposition (high polygenic risk score) for high LDL-C; and of the women without genetic cause for hypercholesterolemia, more than half of women displayed an unfavorable lifestyle profile (30).

Community-based studies have also been involved in studying uncommon loss-of-function variants that may offer insights into function of variants. For example, (gain-of-function) mutations in the PCSK9 (proprotein convertase subtilisin/kexin type 9) serine protease gene were initially identified in families with autosomal dominant hypercholesterolemia (31). Subsequently, loss-of-function mutations were reported in individuals with low circulating low-density lipoprotein (LDL) cholesterol levels (32). Analyses in populationbased studies revealed that low-frequency sequence variants in the *PCSK9* gene and a *PCSK9* genetic score were associated with lower circulating LDL cholesterol levels and reduced risk of cardiovascular events in the general population (33, 34). Recently, PCSK9 inhibitors have been tested in randomized controlled trials (35).

# The Genetic Underpinning of Change in Cardiovascular Traits Over the Life Course

Due to the availability of repeated measures over time, cohort studies are also suitable to explore the genetic underpinning of changes in cardiovascular risk factors over time, and of the progression of subclinical CVD traits longitudinally. For example, a GWAS for carotid IMT measured at different time points over a 10-year period has recently been published (36). Furthermore, several researchers assessed the association of risk factor-associated genetic variants with trajectories of the respective risk factor over the life course. For example, BMI-associated genetic variants have been related to repeated measures of BMI over time (37). Interestingly, BMI in childhood and adulthood were associated with different sets of single nucleotide polymorphisms (SNPs) (37), respectively, consistent with the concept that genetic effects on risk factors might be agedependent. In line with this concept, genetic linkage analyses for BMI provided evidence for age-dependent effects of select genetic loci (38).

On a parallel note, a genetic risk score consisting of 29 SNPs was not only associated with blood pressure and hypertension prevalence at baseline, but also with new-onset hypertension and change in blood pressure over the life course in a large Swedish cohort study (39).

### GWAS for Incident Disease Conditions

The longitudinal character of population-based cohort studies allows genetic variation to be studied in relation to disease incidence. For example, population-based cohort studies have facilitated GWAS for incident heart failure (40), incident stroke (41) and incident MI/ coronary heart disease (CHD) (3). Interestingly, GWAS for incident MI/CHD (3) reported partially discrepant results as compared to GWAS using prevalent CAD cases (7). As an example, the chromosome 9p21 locus – consistently replicated in case-control GWAS for CAD/ MI (7, 42) – provided only modest evidence for association in a GWAS for incident MI/CHD within the CHARGE consortium (3). Of note, the CHARGE consortium (Cohorts for Heart and Aging Research in Genomic Epidemiology) was founded to coordinate joint GWAS analyses of several traits in large population-based cohort studies and to provide opportunities for mutual replication efforts (43).

It is well known that analyses based on prevalent disease cases and those based on incident cases might reveal different results if the association between an exposure and the disease outcome differs by disease severity or disease duration (a phenomenon referred to as prevalence-incidence bias) (44). In order to be included in a casecontrol study as prevalent MI/CAD case, MI patients have to survive the acute event until they are sampled. Given that MI is still associated with substantial case fatality (45, 46), case-control studies are likely enriched for MI/CAD survivors with rather long survival (3). Thus, alleles associated with prevalent CAD in case-control analyses could be related to the risk of developing the CAD event, but could also be related to the chances of surviving the acute CAD event. In line with this concept, the CAD risk allele at the 9p21 locus was associated with longer survival after MI in several population-based cohorts within CHARGE (3).

### Impact of Genetic Variation on Risk Prediction

Furthermore, community-based prospective cohorts allow assessing whether genetic information improves risk prediction models beyond traditional risk factors. It was, indeed, one of the main motivations of the human genome project to use genetic information to predict disease risks in healthy individuals and to predict the response to a given therapy among patients. Several analyses conducted in various population-based cohorts assessed whether genetic variation – e.g., in an aggregated form as risk scores – improved performance measures of risk prediction models for a first CVD event, including discrimination, calibration, and reclassification (47–50). Although the results from individual studies vary, in most cases, the genetic risk scores displayed clear statistically significant associations with CVD endpoints, but improvements in discrimination (e.g., C-statistics; integrated discrimination improvement) and reclassification (e.g., net reclassification index) were more modest (47, 48) and some studies did not provide evidence for improvement in these performance metrices beyond traditional risk factors (49, 50).

### Mendelian Randomization Analyses for Cardiovascular Traits

Genetic information in population-based cohort studies has also been used to assess causality between cardiovascular risk factors or circulating biomarkers and cardiovascular outcomes (incident CVD events) using instrumental variable analyses, a statistical approach referred to as Mendelian Randomization (MR) (51–53). This term, MR, refers to the random assortment of alleles of a given locus at meiosis (51, 52). Thus, if a genetic locus (or a genetic risk score) is strongly associated with circulating biomarker levels or with risk factor levels, individuals are "randomized" to genetically determined high or low biomarker/risk factor levels (51, 52, 54). If the biomarker/ risk factor is causally related to CVD, this difference in genetically determined higher or lower biomarker/risk factor levels should translate into corresponding quantitative differences in disease risk (51, 52, 54). Therefore, in addition to the association between the genetic variant and the risk factor/biomarker of interest, MR analyses also assess the associations between the risk factor/biomarker and incident CVD as well as between the genetic variant and incident CVD (52); the two latter analyses are facilitated by population-based cohort studies. By using genetic information as instrumental variable for the biomarker/risk factor of interest, MR analyses try to avoid two important limitations of observational studies, reverse causality and confounding (54, 55). Using MR analyses in population-based

### References


samples, several traits along the CVD continuum and biomarkers have been tested for potentially causal relations with incident CVD, including high-density lipoprotein (HDL) cholesterol (53), C-reactive protein (56), lipoprotein(a) (57), and many others. It has to be kept in mind, though, that instrumental variable analyses can be affected by different types of selection bias. For example, such analyses might be biased, if a genetic variant is related to mortality, and MR analyses are conducted in an elderly sample (58, 59).

# Conclusion

Population-based studies have substantially improved our understanding of the genetic architecture of normal and abnormal organ function, CVD risk factors, circulating biomarkers, subclinical disease, and overt CVD traits over the life course. Furthermore, they were essential in exploring gene x lifestyle interactions and in evaluating genetic variation in the context of risk prediction models for incident CVD. In addition, population-based cohort studies provided great opportunities to conduct GWAS for incident CVD events, such as MI, stroke and heart failure, and thereby, to overcome classic limitations of case-control GWAS including prevalence-incidence bias. Finally, population-based cohort studies used genetic information as instrumental variables to assess whether cardiovascular risk factors or biomarkers are causally related to clinical CVD (Mendelian Randomization analyses).

# Author Contributions

WL and RV wrote the article together.

### Funding

This work was supported in part by the National Heart, Lung, and Blood Institute (NHLBI) contracts NO1-HL 25195 and HHSN268201500001I (RSV). Dr. Vasan is supported by the Evans Medical Foundation and the Jay and Louis Coffman Endowment. Dr. Lieb received grant funding from the German Ministry of Education and Research (01ER1301/13; 01ZX1606A).


infarction. *Eur J Epidemiol* (2011) 26(11):851–61. doi: 10.1007/s10654-011- 9601-6


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2018 Lieb and Vasan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# GWAS Reveal Targets in Vessel Wall Pathways to Treat Coronary Artery Disease

Adam W. Turner <sup>1</sup> , Doris Wong1,2, Caitlin N. Dreisbach1,3 and Clint L. Miller 1,2,3,4 \*

<sup>1</sup> Center for Public Health Genomics, University of Virginia, Charlottesville, VA, United States, <sup>2</sup> Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, United States, <sup>3</sup> Data Science Institute, University of Virginia, Charlottesville, VA, United States, <sup>4</sup> Department of Public Health Sciences, University of Virginia, Charlottesville, VA, United States

Coronary artery disease (CAD) is the leading cause of mortality worldwide and poses a considerable public health burden. Recent genome-wide association studies (GWAS) have revealed >100 genetic loci associated with CAD susceptibility in humans. While a number of these loci harbor gene targets of currently approved therapies, such as statins and PCSK9 inhibitors, the majority of the annotated genes at these loci encode for proteins involved in vessel wall function with no known drugs available. Importantly many of the associated genes linked to vascular (smooth muscle, endothelial, and macrophage) cell processes are now organized into distinct functional pathways, e.g., vasodilation, growth factor responses, extracellular matrix and plaque remodeling, and inflammation. In this mini-review, we highlight the most recently identified loci that have predicted roles in the vessel wall and provide genetic context for pre-existing therapies as well as new drug targets informed from GWAS. With the development of new modalities to target these pathways, (e.g., antisense oligonucleotides, CRISPR/Cas9, and RNA interference) as well as the computational frameworks to prioritize or reposition therapeutics, there is great opportunity to close the gap from initial genetic discovery to clinical translation for many patients affected by this common disease.

Keywords: genome-wide association study (GWAS), coronary artery disease (CAD), drug targets, smooth muscle cells, vascular wall

# INTRODUCTION

Coronary artery disease (CAD) is a maladaptive inflammatory disease of the coronary artery vessel wall that remains one of the leading causes of death worldwide. It involves numerous cell types (smooth muscle cells, endothelial cells, and macrophages) and often manifests in myocardial infarction. Development of CAD is due to a combination of genetic and environmental factors. Early twin studies indicated CAD heritability was ∼40-60% (1, 2). Linkage and family-based studies identified genes with now well-established roles in disease pathogenesis, such as the LDL receptor (LDLR) (3), apolipoprotein B (apoB) (4), and proprotein convertase subtilisin/kexin type 9 (PCSK9) (5).

### Edited by:

Jeanette Erdmann, Universität zu Lübeck, Germany

### Reviewed by:

Yun Fang, University of Chicago, United States Hsiao-Huei Chen, University of Ottawa, Canada

> \*Correspondence: Clint L. Miller clintm@virginia.edu

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 22 March 2018 Accepted: 29 May 2018 Published: 25 June 2018

### Citation:

Turner AW, Wong D, Dreisbach CN and Miller CL (2018) GWAS Reveal Targets in Vessel Wall Pathways to Treat Coronary Artery Disease. Front. Cardiovasc. Med. 5:72. doi: 10.3389/fcvm.2018.00072

In 2007 the first genome-wide association studies (GWAS) of CAD published the association of the 9p21 locus with both CAD and myocardial infarction (MI) (6–8). The 9p21 locus remains the most robust locus in the genome with respect to CAD association. Many more CAD loci have been discovered in subsequent GWAS over the past decade, leading to the formation of the CARDIoGRAM (9) and C4D (10) consortia and resulting meta-analyses (11–15). The most recent GWAS meta-analysis for CAD has ∼300,000 combined cases and controls and identified almost 100 independent loci reaching genome-wide significance (p < 5 × 10−<sup>8</sup> ), and over 300 loci significant at a 5% false discovery rate.

Despite the discovery of many new loci associated with CAD, the current challenges are to validate the causal genes and pathways at CAD loci and to translate this knowledge into new therapies. In this mini-review, we highlight recent GWAS identified non-lipid genes and pathways (with an emphasis on vessel wall pathways) that have the potential to accelerate new treatments for CAD (**Figure 1**). In addition, we provide some genetic perspective on currently approved and future therapies, as well as the use of genetic risk scores (GRS) to identify high risk patients who may require these novel treatments to augment traditional lipid-lowering therapy.

# CAD GWAS GENES AND PATHWAYS

### Vessel Wall Signaling

Once atherogenic lipoproteins have crossed the endothelium and are taken up by macrophage-derived foam cells, there is a subsequent cascade of complex signaling events in the vessel wall. This involves a tightly orchestrated interplay of vascular smooth muscle cells, endothelial cells, macrophages, cytokines, and extracellular matrix proteins. Reactome pathway gene-set enrichment analysis carried out by the CARDIoGRAM consortium indicated that CAD genes were enriched for pathways involved in NO/cGMP signaling, TGFβ/SMAD signaling, PDGF signaling, extracellular matrix (ECM) integrity/organization, and innate immunity (16). Further integrative analyses of CARDIoGRAM summary data, tissuespecific regulatory networks and gene expression data have revealed interactions across CAD-relevant pathways as well as potential druggable targets such as LUM and STAT3, which serve as key regulators of vessel wall biology (17). Assuming that the genes in these pathways are the most likely causal genes at the associated loci, these results argue that vascular wall pathways have comparable associations to the well-established lipid and lipoprotein mediated pathways (16). In fact up to 75% of the 95 CAD loci (15) appear to be associated independently of classical risk factors. This observation suggests that these risk factors are intrinsic to dysregulated processes in the vessel wall.

### NO/cGMP Signaling

NO/cGMP signaling is fundamental to diverse cardiovascular physiological responses and emerging evidence suggests that activation of this pathway is defective in the setting of atherosclerosis and CAD. Nitric oxide (NO) is an important gas that is synthesized by endothelial nitric oxide synthase (eNOS), which upon activation results in paracrine signaling through the myoendothelial junction to smooth muscle cells, subsequent activation of soluble guanylate cyclase, cGMP production, and cGMP-dependent protein kinase (protein kinase G; PKG) mediated phosphorylation of downstream targets involved in vasodilation. The 1000 Genomes based CARDIoGRAMplusC4D (12) and recent UK Biobank-CARDIoGRAMplusC4D metaanalysis (15) identified an association for rs3918226 at NOS3, the gene which encodes eNOS, implicating a role in endothelial dysfunction. An intronic variant rs7692387 in GUCY1A3, encoding the alpha1-subunit of sGC, was associated with CAD (11), while another variant rs13139571 was associated with systolic (SBP) and diastolic (DBP) blood pressure (18). Recent functional studies identified a mechanism by which the non-risk allele at rs7692387 preferentially binds the ZEB1 transcription factor leading to increased GUCY1A3 expression and sGC levels, which correlated with reduced atherosclerosis severity in mice (19). Other members of this pathway that have been linked to CAD include recently identified PDE5A (rs7678555) (15) and PDE3A, previously associated at 5% FDR (11), suggesting alterations in vascular wall signaling could be rescued with existing therapies (e.g., sildenafil, ciloztasol).

# TGFβ and PDGF Signaling

The CARDIoGRAM GWAS studies have implicated several components of the transforming growth factor beta (TGFβ) signaling pathway in CAD. Activated TGFβ receptor I phosphorylates receptor-regulated SMAD proteins (SMAD3 or SMAD2). These are transcriptional mediators of TGFβ signaling that along with SMAD4 translocate to the nucleus to regulate transcription of TGFβ target genes. The TGFβ1 and SMAD3 genes are both associated with CAD in addition to bone morphogenic protein 1 (BMP1), a member of the TGF beta superfamily (20). Mechanistic studies have implicated a functional intronic SNP in SMAD3 (rs17293632) that disrupts binding of the AP-1 transcription factor complex underlying this association (21, 22). The genetic association of rs36096196 at the SKI locus suggests a role for SKI, a co-repressor of SMAD3/SMAD2 signaling in CAD (23).

The rs150512726 SNP [proxy for the recently reported SNP rs142695226 (15)] results in a 3 amino acid deletion in the integrin beta 5 (ITGB5) protein. ITGB5 has been shown to play a role in activation of the latent TGFβ precursor protein outside the cell (24). The TGFβ pathway also regulates gene expression at the 9p21 locus. SNPs at this locus disrupt TEAD factor binding and the TEAD3-dependent TGF beta induction of p16 in human aortic smooth muscle cells (25).

The CARDIoGRAM studies have also identified SNPs at the platelet-derived growth factor D (PDGFD) locus associated with CAD at genome-wide significance. This PDGF mediated pathway may involve many other risk-associated genes. Preliminary work by our group has provided evidence of cross-talk with smooth muscle cell enriched pathways using genome-wide profiling of these cells. For example, the expression of TCF21, a transcription factor which determines the fate of epicardial progenitor cells during development, is increased in individuals carrying the risk alleles, rs121902987 and rs12524865 (26). Its expression

was positively regulated by PDGF-BB-PDGFRB stimulation in human coronary artery smooth muscle cells (26). TCF21 dysregulation likely increases CAD risk by altering coronary artery smooth muscle cell responses to vascular injury during plaque remodeling (27, 28). Another vessel wall gene, LMOD1, an actin filament nucleator, was shown to be downregulated in vascular smooth muscle cells in response to PDGF treatment and serves as a potent marker of smooth muscle cell phenotypic modulation (29).

### Extracellular Matrix Remodeling Pathways

The CARDIoGRAM consortium has highlighted numerous extracellular matrix and basement membrane genes involved in the pathogenesis of atherosclerosis, including COL4A1/COL4A2, ITGB5, and FN1. A COL4A2 variant, rs4773144, was associated with both COL4A1 and COL4A2 expression, as well as smooth muscle cell survival, and plaque stability (30). The authors suggest type IV collagen levels affect SMC proliferation, migration, extracellular matrix remodeling, apoptosis, and infiltration of immune cells through plaque remodeling. The CAD locus MIA3 is involved in the endoplasmic reticulum export of large cargo such as pre-chylomicrons/VLDL (31) and collagens (including Col4a1 and Col4a2 in mice) (32). The CAD locus SERPINH1 encodes heat-shock protein 47 (Hsp47) (33), a molecular chaperone involved in the collagen secretion pathway. FN1 encodes fibronectin, a glycoprotein with established roles in cell adhesion, migration, growth, and differentiation. Though increased in atherosclerotic regions, the role of fibronectin in development of CAD remains unclear, with postulated roles in atherogenic lipoprotein retention, direct adverse effects on endothelial cell function, or roles in plaque stability (34). The TNS1 gene encodes for the tensin-1 protein that attaches the plasma membrane to the extracellular matrix and positively regulates the small Rho GTPase, RhoA (35). The RHOA gene itself was identified as a genome-wide significant locus in the latest CARDIoGRAMplusC4D metaanalysis (15) and is predicted to interact with several other CAD genes/pathways in smooth muscle cells and endothelial cells including TGFβ/SMAD3 and ECM proteins, such as collagens and fibronectin (36). RhoA also cooperates with Rac1 and cadherin to regulate barrier function in mural and endothelial cells (37). RhoA activation coincides with endothelial cell inflammation, permeability, and disturbed flow as a result of reduced PPAP2B (itself associated with CAD and ischemic stroke) (38). Lastly**,** the ADAMTS7 gene, encoding a metalloproteinase, is proatherogenic based on mouse studies, with a direction of effect consistent with the human genetic association data (39). In the context of its association with CAD, it has been proposed that ADAMTS7 alters smooth muscle cell migration and extracellular matrix composition (40).

### Inflammation and Immune Pathways

The role of inflammation in CAD pathogenesis is now wellestablished, yet the number of inflammatory genes mapping to CAD-associated loci is under-represented. One of the main CAD loci involved in inflammation is the interleukin 6 receptor (IL6R), which binds the pro-inflammatory cytokine IL-6 and its pathways have been causally linked to CAD using Mendelian randomization analyses (41). Another example is the CADassociated CXCL12 gene, which encodes an anti-inflammatory cytokine (also known as stromal derived factor 1; SDF-1) that binds the chemokine (C-X-C motif) receptor CXCR4, a G-protein coupled receptor. Given that CXCL12 is induced immediately after vessel injury and specifically expressed in atherosclerotic lesions, this gene has potential to serve as a biomarker for early detection (42). The CAD-associated SH2B3 gene encoding an adapter protein known as LNK is involved in hematopoiesis and suppression of cytokines and thrombopoietin signaling (43). In mice, loss of Sh2b3 was shown to promote both atherosclerosis and thrombosis only under the setting of hypercholesterolemia, suggesting an involvement in platelet/leukocyte activation during atherogenesis (44). It may also serve as an inflammatory link between vascular endothelial cells and immune cells and therapeutic target for hypertension and end-organ inflammation (45). Finally, the ligand VEGFA and the VEGF receptor (FLT1) loci both associate with CAD; inflammatory conditions in the plaque promote the release of angiogenic factors that result in neovascularization, plaque remodeling, and plaque instability (46).

### CURRENT THERAPIES FOR CAD

Current therapies for CAD primarily focus on alleviating the symptoms of ischemic events as well as preventing thrombosis from ruptured plaque. Here we review the current treatments for CAD and also provide a genetically informed perspective on these drug targets (**Table 1**).

### Statins

Statins represent the first line of treatment for elevated LDLcholesterol levels associated with hyperlipidemia and CAD. By inhibiting HMG-coA (hydroxy-3-methylglutaryl-coenzyme A) reductase, statins decrease the production of cholesterol in the liver, thereby reducing its concentration in the circulation. Statins exhibit a pleiotropic effect by attenuating other risk factors for CAD (64). Genetic studies have identified variations in the HMGCR gene (rs12916) consistently associated with both blood lipids and LDL-cholesterol (52, 53), while an intergenic variant near HMGCR is also associated with CAD in the combined CARDIoGRAMplusC4D and UK Biobank analysis (20).

### Anti-Platelet Therapies

As a prophylactic measure against thrombosis, antiplatelet drugs are utilized to reduce the risk of myocardial infarction. Two of the more popular antiplatelet drugs are acetylsalicylic acid (ASA) and clopidogrel. ASA is a COX inhibitor that prevents platelet activation by inhibiting the synthesis of thromboxane A2. On the other hand, clopidogrel is an ADP receptor (P2Y12) antagonist that prevents platelet aggregation and further amplification of the activation signal through the downregulation of glycoprotein IIb/IIIa receptor on its surface (65). While the gene targets of these drugs (PTSG2 and P2Y12) do not harbor variants specifically associated with CAD, some of the effector signaling molecules in the pathway (RHOA, ITGB5, and SH2B3) indeed have CAD associations, as described above. This may represent an opportunity to understand some of the heterogeneity in responses to these commonly used agents by using a pathway approach.

### ACE Inhibitors and Beta Blockers

Two classes of drugs, angiotensin converting enzymes (ACE) inhibitors and beta blockers both function in the maintenance of normal blood pressure. In the endothelium, ACE catalyzes the conversion of angiotensin I to angiotensin II where the latter is a potent vasoconstrictor. Additionally, ACE upregulation results in the degradation of bradykinin, a vasodilatory factor involved in the upregulation of nitric oxide and prostaglandins (66). Given the numerous CAD associations within the NO/cGMP pathway, the efficacy or toxicity profile of these drugs may be influenced by individual genetic variation. Beta blockers exert their cardioprotective effects by intervening in the adrenergic nervous system as competitive antagonists in both the myocardium and vasculature, depending on their selectivity for beta1 or beta2-adrenergic receptors. Clinically, reduced catecholamine stimulation results in decreased cardiac stress leading to decreased heart rate and blood pressure (67, 68). The third generation of beta blockers were shown to have more potent blood pressure lowering effects. Although it may be reasonable to speculate that NO mediated signaling is involved, it was recently demonstrated that nebivolol (compared to metoprolol) suppresses ET-1 mediated vasoconstriction to lower BP (69). This is important given that variation at the ET-1 gene EDN1 (rs1629862) and the ET-1 receptor type A gene EDNRA (rs6841581) were recently identified as CAD loci (20).

### Anti-Inflammatory Therapies

Therapies targeting inflammatory pathways have been extensively explored in cardiovascular disease. Two recent TABLE 1 | List of current target genes for management of coronary artery disease and their genetic associations.


(Continued)


studies investigating the role of clonal expansion of hematopoietic cells as a potential driver for age-related onset of atherosclerosis have provided evidence that IL1β secretion from TET2 deficient macrophages plays a role in the acceleration of disease (70, 71). TET2 is an epigenetic modifier that negatively regulates the expression of IL1β. Thus, loss of function of TET2 results in the upregulation of IL1β and IL-6 secretion from lesional macrophages (70). This elevated level of proinflammatory cytokines was positively correlated with increased plaque size in the aorta as well as severity of coronary artery calcification in mice and human patients, respectively(70, 71). Studies such as these underscore the potential of targeting the IL1β pathway in slowing down atherosclerosis progression.

The CANTOS (NCT01327846) clinical trial provided critical evidence that targeting IL1β alone with the monoclonal antibody canakinumab can reduce major cardiovascular events along with proinflammatory cytokines (IL-6) and high sensitivity C reactive protein in patients with atherosclerosis. Although the intermediate dose (150 mg) met the primary endpoint of reducing nonfatal myocardial infarction, nonfatal stroke, or cardiovascular death, a significant risk of fatal infection relative to placebo was observed (72). In addition to the high pricing and safety concerns, the marginal clinical benefits demand more development in this area. Given that IL-6 is a causal risk factor for CAD (73), anti-inflammatory therapies remain an attractive therapeutic approach for patients that do not respond to standard lipid lowering medication.

### NEW CAD THERAPIES INFORMED FROM GENETIC STUDIES

### PCSK9 Inhibitors and Antisense Oligonucleotides

One example of newly approved drug targets that have origins in genetic studies is the development of monoclonal antibodies against PCSK9. PCSK9 is a liver protease that targets LDL receptors for lysosomal degradation. The therapeutic potential of targeting PCSK9 was validated through Mendelian randomization studies that correlated a deleterious mutation in this gene with decreased risk (74). Large clinical trials [e.g., FOURIER (NCT01764633), ODYSSEY (NCT01623115)] demonstrated that inhibition of this enzyme reduced systemic LDL levels to a greater extent than maximum statin therapy, with the most recent ODYSSEY trial (NCT01663402) reporting a reduction in both cardiovascular events and all-cause mortality for the first time. In addition to monoclonal antibodies, antisense oligonucleotides have also been developed against PCSK9, which should be evaluated for clinical outcomes in the near future.

# Lipoprotein A and APOC3 Antisense Oligonucleotides

A high level of circulating lipoprotein A [Lp(a)] is considered a risk factor for cardiovascular disease. Two SNPs, rs3798220 and rs10455872, located within the lipoprotein A (LPA) gene correlate with increased levels of Lp(a) and are associated with increased risk for CAD. As one of the first therapies targeting lipoprotein A, AKCEA-APO(a)-LRx is an antisense oligonucleotide that binds LPA mRNA leading to its degradation. Phase 2 clinical trial data has suggested that this approach is well tolerated and significantly reduced Lp(a) plasma concentrations (65, 66). Similarly, an antisense therapy was developed targeting APOC3, a gene involved in regulating plasma triglyceride levels. The antisense oligonucleotide therapy, volanesorsen was shown to reduce cellular levels of APOC3 and led to an overall reduction of triglyceride levels in phase 3 clinical trials (70, 71).

# RhoA-ROCK Inhibition

The RhoA-Rock signaling pathway offers another avenue for CAD therapeutic targets. Aberrant activation of this signaling cascade has been implicated in vasoconstriction and endothelial dysfunction. Given the recent CAD association (rs7623687) at RHOA, further investigation is warranted to determine how to specifically target this gene. One opportunity is to target the downstream effectors, Rho-associated protein kinases (ROCK1, ROCK2), which control actin cytoskeleton arrangement, cell migration and contractility (75). In particular, a Rock2 inhibitor, Fasudil, has already been tested in clinical trials as a possible therapeutic for CAD as a vasodilatory agent through the upregulation of nitric oxide. It is also noteworthy to mention that Fasudil has been approved as a treatment option for cerebral vasospasm in Japan and China (75).

# PERSPECTIVES AND FUTURE DIRECTIONS

## Genetic Risk Scores

Besides the 9p21 locus, most loci uncovered from GWAS of CAD have small effects with odds ratios between 1.05 and 1.30. Nonetheless, GWAS results can be utilized to generate genetic risk scores for individuals based on the number of risk alleles they harbor. Therefore, in addition to traditional drug treatments such as statins, individuals that fall within the high CAD risk range based on their genetic risk score can be selected for more aggressive therapies and/or novel CAD treatments as mentioned above. With more data from sources such as the UK Biobank, the Million Veterans Project, and the NIH-funded All of Us project on the horizon, genetic risk scores will have more clinicallyrelevant predictive utility (76).

### Feasible vs. Difficult Drug Targets

Since GWAS has highlighted the role of vessel wall genes and signaling pathways in the pathogenesis of CAD, it will be critical to apply this knowledge toward vessel wall therapeutic development. Strategies include non-specific targeting of the vessel wall (through upstream or downstream effector molecules), specifically targeting plaque vasculature, or specific cellular phenotypes (e.g., activated resident macrophages or phenotypically modulated smooth muscle cells).

# Target New Cell Types (e.g., Endothelial Cells, Smooth Muscle Cells, Macrophages)

A CAD protective variant upstream of ADAMTS7 confers greater protection against CAD for never-smokers compared to those that have smoked 100 or more cigarettes in their lifetime (77). This example highlights the importance of taking into account environmental factors in managing treatments. Other potential targets include the receptors for endothelin-1 on smooth muscle cells. Many of these potential vessel wall target proteins affect smooth muscle cell proliferation and migration, originally believed to drive atherogenesis. The current view suggests smooth muscle cell proliferation and migration could be reparative and promote plaque stability (78). Once the roles and timing of smooth muscle cell proliferation and migration are clarified, the TGF beta and PDGF pathways may be attractive targets due to their role in the regulation of smooth muscle cell genes.

# REFERENCES


### Machine Learning/Systems Approaches

While GWAS has uncovered invaluable insights into potential therapies and validated existing ones, these associations require extensive follow-up to pinpoint causal variants, genes, pathways. More advanced algorithms such as machine learning can be leveraged to prioritize targets with diverse data inputs such as electronic health records, clinical notes, and -omics. These approaches can help to systematically decrease noise, reduce features, and identify gene sets of interest in addition to common GWAS methods of odds ratios, p-value statistics, and chi-square comparisons. Unsupervised learning algorithms have the capability to provide researchers and clinicians with an unbiased network of candidate genes that account for the greatest variance in CAD related phenotypes. A specific example is the use of machine learning for drug repurposing based on finding patterns from multi-dimensional datasets. Specific tools have been developed to provide an out-of-thebox approach for understanding diverse text, biological, and medical record data for non-data scientists. One such tool, RepurposeDB, combines drug and disease information to create a reference database for drug repositioning research (79). With the rapidly growing costs of drug discovery/development, such data-informed approaches can offer significant progress for the field.

# CONCLUDING REMARKS

In summary, in this brief review we bring attention to the genetic loci discovered over the past decade which play critical roles in the vessel wall. Many of these genes are organized into distinct functional pathways, which will help redefine some of the pathogenic mechanisms and prioritize those pathways for future drug development or repurposing strategies.

# AUTHOR CONTRIBUTIONS

AT and CM conceived of the manuscript. AT, DW, CD, and CM wrote the manuscript.

### FUNDING

Funding support provided by the National Institutes of Health (R00 HL125912) to CM.


of 82 studies. Lancet (2012) 379:1205–13. doi: 10.1016/S0140-6736(11) 61931-4


isolated populations identify genetic associations with medically-relevant complex traits. Nat Commun. (2017) 8:15606. doi: 10.1038/ncomms 15606

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Turner, Wong, Dreisbach and Miller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Integrating Genes Affecting Coronary Artery Disease in Functional Networks by Multi-OMICs Approach

Baiba Vilne1,2 and Heribert Schunkert 1,2 \*

<sup>1</sup> Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, Munich, Germany, <sup>2</sup> Munich Heart Alliance, German Centre for Cardiovascular Research, Munich, Germany

Coronary artery disease (CAD) and myocardial infarction (MI) remain among the leading causes of mortality worldwide, urgently demanding a better understanding of disease etiology, and more efficient therapeutic strategies. Genetic predisposition as well as the environment and lifestyle are thought to contribute to disease risk. It is likely that non-linear and complex interactions occur between these multiple factors, involving simultaneous pathological changes in diverse cell types, tissues, and organs, at multiple molecular levels. Recent technological advances have exponentially expanded the breadth of available -omics data, from genome, epigenome, transcriptome, proteome, metabolome to even the microbiome. Integration of multiple layers of information across several -omics domains, i.e., the so-called multi-omics approach, currently holds the promise as a path toward precision medicine. Indeed, a more meaningful interpretation of genotype-phenotype relationships and the development of successful therapeutics tailored to individual patients are urgently needed. In this review, we will summarize recent findings and applications of integrative multi-omics in elucidating the etiology of CAD/MI; with a special focus on established disease susceptibility loci sequentially identified in genome-wide association studies (GWAS) over the last 10 years. Moreover, in addition to the autosomal genome, we will also consider the genetic variation in our "second genome"—the mitochondrial genome. Finally, we will summarize the current challenges in the field and point to future research directions required in order to successfully and effectively apply these approaches for precision medicine.

### Edited by:

Jeanette Erdmann, Universität zu Lübeck, Germany

### Reviewed by:

Krishna Aragam, Massachusetts General Hospital, Harvard Medical School, United States Francesco Danilo Tiziano, Università Cattolica del Sacro Cuore, Italy

> \*Correspondence: Heribert Schunkert schunkert@dhm.mhn.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 25 March 2018 Accepted: 22 June 2018 Published: 17 July 2018

### Citation:

Vilne B and Schunkert H (2018) Integrating Genes Affecting Coronary Artery Disease in Functional Networks by Multi-OMICs Approach. Front. Cardiovasc. Med. 5:89. doi: 10.3389/fcvm.2018.00089 Keywords: cardiovascular disease, multi-omics, genomics, transcriptomics, metabolomics, gut microbiome

# INTRODUCTION

In the current era of high-potency statin therapy it becomes increasingly clear that even individuals with normal LDL-cholesterol levels without any conventional risk factors may develop atherosclerosis (1). The most pertinent manifestation of atherosclerosis is coronary artery disease (CAD), a highly complex disease, influenced by both multiple genetic risk variants and lifetime exposure to an atherogenic environment (2). A better understanding of the etiology of CAD and directions toward hitherto therapeutically not addressed disease mechanisms are urgently demanded (3). During the last 10 years, the genetic risk has been thoroughly explored in numerous genome-wide association studies (GWAS), leading to identification of >300 chromosomal loci which all significantly affect the risk of CAD (4–15). More than 90% of these common disease risk variants are located outside the protein-coding regions and have modest effect sizes (2, 16). Collectively they explain only ∼25% of the overall disease heritability. This suggests that genetic variation may contribute to disease risk in a non-linear, interactive and complex way (17), leading to pathological changes in diverse cell types, tissues, and organs, at multiple molecular levels (18).

Recent technological advances have exponentially expanded the breadth of available -omics data (17). High-throughput monitoring of the abundance of various biological molecules and determination of their variation between different conditions on a global scale has become possible, promoting a paradigm shift in the way we approach biomedical problems (19). At the same time, it has been increasingly recognized that no single type of data can fully capture the intricacy of most complex molecular traits that manifest collectively as disease phenotypes (20–22). Rather, it is the integration of multiple layers of information across several -omics domains, i.e., the so-called multi-omics approach [also referred to as integromics or panomics (19)], that holds the promise for precision medicine (**Figure 1**) (19).

Of note, integrative analysis across multiple-omics layers can be conducted in two ways (**Figure 2**): pair-wise data integration and multi-dimensional i.e., network-based integration (22). Furthermore, pair-wise integrations can be divided into genetic and non-genetic correlations (22). In the first case, DNA variants (i.e., allelic distributions of single-nucleotide polymorphisms; SNPs) are tested for association with down-stream omics markers such as transcriptomic alterations, protein, metabolite or methylation levels or quantitative and qualitative measures of microbiome, via the so called quantitative trait loci (QTL) mapping. In the second scenario, one would explore correlations between down-stream omics data, e.g., correlation of CpG methylation levels to transcript expression or between metabolome and gut microbiome, however it may be difficult to infer causal relationships in such case (22). Considering the largely unexplored role of the established CAD risk loci from GWAS (23) and the central dogma that genetic variations control the transcriptome, which in turn affects e.g., the proteome (20), and metabolome (**Figure 2,** middle panel), our main focus will be pair-wise integrations linking genetic variation related to CAD risk to other down-stream omics layers such as epigenome, transcriptome, proteome or metabolome. Although multi-dimensional integrations have been widely used in the field of cancer research, their application in the context of CAD has so far been limited (22). Moreover, in addition to the autosomal genome, we will also consider the genetic variation in our "second genome"—the mitochondrial genome and its contribution to CAD.

### INTEGRATING GENETIC VARIATION AND EPIGENOME

Epigenomic signatures reflect various DNA modifications and may affect gene regulatory mechanisms that do not involve changes in the DNA sequence per se. Thereby, epigenomics may become a critical mediator of environmental influences and risk factors acting on the genome (20, 24). Three unique, but highly interrelated, epigenetic processes can be distinguished: DNA methylation, histone modifications (e.g., methylation, acetylation, phosphorylation, DP-ribosylation, and ubiquitination) and RNA-based mechanisms (e.g., microRNAs, long non-coding RNAs or lncRNAs, small interfering RNAs) (20, 24). Although, technically non-coding RNAs belong to the epigenome (20), we will discuss them in the next section, as the respective omics data are acquired via transcriptome profiling (RNA-seq).

DNA methylation and histone modifications are the best understood of the epigenetic mechanisms thus far and have been widely suggested to regulate gene expression and affect CAD risk factors including atherosclerosis, inflammation, hypertension and diabetes (25). DNA methylation consists of the covalent methylation of the C5 position of cytosine residues, when they are followed by guanine residues (CpG dinucleotides). It is partly heritable but it is also a dynamic process related to environmental stimuli and life style factors (26). Hedman et al. (27) analyzed epigenetic changes associated with lipid concentrations and identified a number of meQTLs, enriched in signals from GWAS on lipid levels and CAD. For example, genome-wide significant variants (rs563290 and its proxies), associated with LDL cholesterol and CAD at APOB, were meQTLs for a LDL cholesterol-related differentially methylated locus (**Table 1** and **Figure 3**).

Furthermore, the CDH13 (T-cadherin) locus may present an interesting example in the context of epigenetics and CAD. Putku et al. (39) reported several genetic variants in the promoter of CDH13 as meQTLs in hypertension patients (**Table 1**), several of them being also associated with high molecular weight adiponectin, a known ligand for CDH13, the binding of which results in increased proliferation and migration of endothelial cells (39). Moreover, recently Nelson et al. (13) identified a genetic variant in the intron of CDH13, which affects expression of this gene in vascular tissues, and is genome-wide significantly associated with CAD (28) (**Table 1**). Interestingly, the expression levels of CDH13 and lncRNAs from the same locus showed positive correlations, suggesting a functional link, as lncRNAs are known to display correlations with the expression of their neighboring protein-coding target genes (48).

An exciting field of future research will be studies conducting parallel profiling of genetic variation with histone modifications and Hi-C and ChIA-PET-based chromatin contact maps to uncover local and distal histone quantitative trait loci (hQTLs) (49) in CAD patients.

Overall, considering the critical role of epigenetic modifications as a critical mediator of environmental influences on the genome (20, 24), we urgently need more investigations studying DNA methylation and other epigenetic modifications genome-wide and in large enough cohorts, ideally also elucidating the differences between tissues and cells in healthy vs. CAD patients. Moreover, this should be supplemented with careful documentation of multiple environmental and lifestyle factors over time, i.e., the envirome, as well as comprehensive clinical information to draw a link between the environment and CAD.

FIGURE 1 | Multi-omics approach for precision medicine. Multi-omics (i.e., genome, epigenome, transcriptome, proteome, metabolome, microbiome, and envirome) data are collected from patients and integrated to create their individual molecular signatures (i.e., complex biomarkers), which are then used to select an appropriate drug for a particular patient, thus improving the treatment efficiency and reducing the possible side effects.

### INTEGRATING GENETIC VARIATION AND TRANSCRIPTOME

Transcriptomics reflect genome-wide measures of RNA levels, both protein-coding RNA as well as the non-coding RNAs (i.e., microRNAs, lncRNAs, and small interfering RNAs) under specific conditions or in a specific cell. Moreover, the transcript levels are examined both qualitatively (i.e., which transcripts are present, identification of novel transcripts, splice sites, and RNA editing sites) and quantitatively (quantification of transcript abundance) (21).

### Protein-Coding RNAs

Parallel assessments of genetic variation and transcriptome profiles across disease-relevant tissues, i.e., via mapping expression quantitative trait loci (eQTLs) to identify susceptibility genes (mainly protein-coding), has been the most commonly applied approach (28, 29, 50–52). Björkegren et al. have performed a number of integrative network analysis, linking CAD risk variants and transcriptome data in seven disease-relevant vascular and metabolic tissues, collected from up to 600 CAD patients during coronary artery bypass surgery (28, 29, 53, 54). From these investigations, visceral abdominal fat has emerged as an important generegulatory site for blood lipids. Several risk SNPs for HDL-, LDL-, and total cholesterol levels, as well as for CAD demonstrated significant eQTL effects in visceral abdominal fat (28, 29).

Huan et al. (30) also used integrative analysis to investigate the molecular mechanisms of blood pressure regulation and identified a blood pressure associated SNP (rs3184504) in SH2B3, also associated with the expression (eQTL) of several TABLE 1 | Genetic variation related to CAD/MI risk that has been associated with other down-stream omics layers such as transcriptome (mRNA, microRNAs and lncRNAs), epigenome, proteome or metabolome.


(Continued)

TABLE 1 | Continued


genes, including SH2B3, in the genetically inferred causal blood pressure gene sets (**Table 1** and **Figure 4**). Some of these genes were also perturbed in Sh2b3−/<sup>−</sup> mice, demonstrating blood pressure-related phenotype (30). Rs3184504 has been previously also associated with CAD risk (9).

Much less investigated are non-coding RNA transcripts, such as micro-RNAs (miRNAs) and long non-coding RNAs (lncRNAs). Recent evidence suggests that at least some of these may play a role in CAD (55–58). Although, technically noncoding RNAs belong to the epigenome (20), we will discuss them in this section, as the respective omics data are acquired via transcriptome profiling (RNA-seq).

### Micro RNAs

MiRNAs are involved in the transcriptional control of all main cell types participating in atherosclerosis progression, including endothelial cells, vascular smooth muscle cells, and macrophages (32, 59). Several studies have investigated the differential expression patterns of miRNAs in plasma/serum, microparticles, whole blood, platelets, blood mononuclear intimal, and endothelial progenitor cells in CAD vs. non-CAD patients, as summarized by Malik et al. (60). In majority of cases, up-regulation of different miRNA in CAD patients was observed (60). Moreover, growing body of evidence suggests that genetic variations in the miRNA targetome may lead to major deleterious outcomes (61, 62). For example, Miller et al. (31) have shown that an established CAD risk variant (rs12190287) resides in the 3′ untranslated region of a transcription factor TCF21 and alters the seed binding sequence for miR-224. Moreover, allelic imbalance studies in circulating leukocytes and human coronary artery smooth muscle cells have demonstrated a significant imbalance of the TCF21 transcript levels, which correlated with genotype at rs12190287, consistent with this variant contributing to allele-specific expression differences (31). Richardson et al. (33) have reported that a variant (rs13702) in the 3'-UTR of lipoprotein lipase (LPL) disrupts the binding of miR-410 and modulates the effect of diet on plasma lipid levels (33). Recently, Bastami et al. (34) performed a more systematic computational screening, by mapping the established CAD risk variants to the miRNA targetome, identifying several links between SNPs and miRNAs (**Table 1**; https://www.ebi.ac. uk/gwas/). In a recent study from our group (16), we also mapped CAD risk variants from the CARDIoGRAMplusC4D GWAS meta-analyses (9), to 3′ UTR regions of genes to assess their overlaps with predicted target miRNA binding sites. Interestingly, the 3′ UTR region of MRAS was predicted to be targeted by 29 miRNAs and 23 miRNAs were predicted to bind more than one candidate CAD gene (**Table 1**). Thus far, there have been relatively few studies investigating genomewide miRNA eQTLs (miR-eQTLs). Huan et al. (35) identified a

genetic variant (rs2370747) associated with miR-100-5p and miR-125b-5p expression, a proxy SNP of which was also associated with lipid traits (HDL-, LDL-, and total cholesterol as well as triglycerides). Moreover, it was found that both miRNAs were also differentially expressed in relation to HDL cholesterol (35).

Civelek et al. (36) examined the genetic regulation of human adipose miRNA expression and its consequences for metabolic traits. Interestingly, this study showed, how genetic variation might influence the processing of miRNAs, i.e., the ratio of miRNA expression from the 3p and 5p arms. It is known that a miRNA precursor can give rise to two mature miRNAs from the 3p and 5p arm, one of which usually having higher expression than the other. The 3p/5p ratios of several miRNAs have been shown to be significantly different among various healthy tissues (63) and altered in pathological conditions compared with healthy controls (64). Civelek et al. demonstrated a significant association of the SNP rs13064131 with the 3p/5p ratio of miR-28, encoded from the LPP gene (**Figure 5**) (36). However, the SNP was not associated with the expression levels of the LPP transcript itself or with the abundance of miR-28-3p or miR-28-5p, suggesting that its effect on the 3p/5p ratio may be independent of transcription, possibly via degradation or stabilization mechanisms.

# Long Non-Coding RNAs

The recent discovery of an extensive catalog of lncRNAs i.e., long RNA transcripts that do not code for proteins—has opened a new perspective on the importance of the RNA-based mechanisms in gene regulation (24). LncRNAs are emerging as important regulators of various cellular processes, with many possible implications in cardiovascular disease pathophysiology (57, 58). In fact, the most prominent CAD risk locus at Chr9p21 (66, 67) harbors the lncRNA—ANRIL (Antisense Noncoding RNA in the INK4 Locus, CDKN2B antisense RNA). From these, rs10757274 is the strongest genetic predictor of early MI and is not associated with established CAD risk factors such as lipoproteins or hypertension, making ANRIL a key candidate (38). Interestingly, ANRIL is found both as a linear lncRNA (linANRIL), the transcript levels of which are known to positively correlate with disease severity (68), and is also capable of forming RNA circles (circANRIL)

(69). Recently, Holdt et al. (69) demonstrated that circANRIL regulates the maturation of precursor ribosomal RNA (prerRNA), by this impairing ribosome biogenesis and inducing nucleolar stress and apoptosis in vascular smooth muscle cells and macrophages (**Figure 6**). Carriers of the CAD-protective haplotype at 9p21 showed significantly increased expression of circANRIL (69).

Currently, there have not been many large-scale studies on lncRNAs in the context of CAD, though. Ballantyne et al. (37) recently conducted a genome-wide interrogation of long intergenic non-coding RNAs (lincRNAs) that associate with cardiometabolic traits in GWAS, including CAD and also identified a number of CAD/MI and type 2 diabetes associated SNPs at Chr9p21 that overlapped lincRNA transcripts (**Table 1**) (37). In STARNET (28), 5.4% of the identified cis-expression quantitative trait loci (eQTLs) were related to the expression of lncRNAs, however these have not been further explored, so far. Overall, more studies focusing on non-coding RNAs in different CAD relevant tissues in large enough cohorts will be required to yield insights into the possible functional roles of this portion of transcriptome and its genetic determinants, in healthy and disease states. Moreover, considering that lncRNAs are generally found to be more lowly-expressed, sufficient depth of coverage for RNA-seq experiments will need to be guaranteed (21).

# INTEGRATING GENETIC VARIATION AND PROTEOME

Proteomics uses high-throughput approaches (mainly MS-based) to quantify protein abundance, post-translational modifications and interactions (e.g., using phage display and yeast two-hybrid assays) in a tissue, cell or fluid compartment, such as plasma or urine (21). Considering that the transcriptome is not linearly proportional to proteome, that proteins are the biomolecules that execute cellular functions, and that many human diseases ultimately result from alterations in the proteome (70), such studies are urgently needed to facilitate the explorations of CAD etiology. However, proteome studies are still rare in relation to CAD, mostly due to the complex methodology involved. There have been some investigations in the past few years, aiming at characterizing the proteomes of several CAD-related tissues and cell types, including human arterial smooth muscle cells (71), platelets (72), as well as body fluids such as urine (73).

Only few studies (14, 40) have analyzed genetic variants that modify protein levels, i.e., the so-called protein quantitative trait loci (pQTLs) (**Table 1**). Chen et al. (40) assayed a preselected set of plasma proteins, identifying several pQTLs that overlapped with CAD risk SNPs and also explained a substantial proportion of inter-individual variation in protein abundance. For example, rs12740374 at the CELSR2/SORT1 locus, a variant associated with lipids and CAD, explained 15% of inter-individual variation in plasma granulin levels (**Figure 7**). Interestingly, progranulin binds to SORT1 and Sort1 knockout mice show markedly elevated levels of progranulin (40). Recently, it was also demonstrated that progranulin is involved in lysosomal homeostasis and lipid metabolism (74).

As the proteomics technologies improve over time (21), more genome-wide investigations of CAD-related alterations in proteome and also phosphorpoteome in increasing numbers of disease relevant tissues are expected to be conducted in the near future. However, as proteins are more sensitive to their environment (21), caution will have to be taken during sample preparation steps to obtain accurate and reproducible results.

# INTEGRATING GENETIC VARIATION AND METABOLOME

An important additional functional layer in mutli-omics data integration is the metabolome, as it represents an integrated state of all genetic, epigenetic and environmental factors, thus providing a link between genotype and phenotype (75). Metabolomics is an omics field that systematically identifies and quantifies multiple small molecule (typically <1,500 Daltons) types, such as amino acids, fatty acids, carbohydrates and biochemical intermediates, i.e., metabolites (21). A plethora of metabolites in blood and urine have been associated with CAD and subsequent cardiovascular events (76–79) and have been

demonstrated as promising biomarkers discriminating CAD vs. non-CAD subjects (78), as well as between thrombotic MI and stable CAD cases (80). Kraus et al. (42) recently identified several genetic loci demonstrating associations with blood plasma metabolites (i.e., metabolomic quantitative trait loci; mQTLs), the strongest findings being for the circulating short-chain dicarboxylacylcarnitine (SCDA) metabolite levels with variants in genes that regulate components of endoplasmic reticulum (ER) stress (**Table 1** and **Figure 8**) (42).

Besides blood and urine, metabolomic profiles of vascular and metabolomic tissues such as subcutaneous fat will need to be generated, ideally in conjunction with other omics layer data. Especially, gut microbiome would be of utmost interest, considering the close link between the two (81).

However, of note, metabolic profiles are even more prone to variability affected by sample preparation and storage conditions, as well as by several other factors including patient heterogeneity (21). Hence, the required sample size has to be carefully considered, to inspire confidence in the generated results.

### INTEGRATING GENETIC VARIATION AND MICROBIOME

Microbiomics investigates all the microorganisms of a given community, including bacteria, viruses, and fungi, collectively known as the microbiota (and their genes constituting the microbiome) (21). The human microbiome is enormously complex and there are substantial variations in microbiota composition between individuals resulting from seed during birth and development, diet and other environmental factors, drugs and age (21). Thousands of different bacterial species make up the human microbiomes, from which there is a small number of abundant species and a large number of rare or low abundance species, the differential functions of which remain poorly understood (82). Currently, several large scale initiatives are emerging including the American Gut Project http://americangut.org/ and the British Gut Project http:// britishgut.org/, which are expected to produce a rich collection of anonymised human gut samples and lifestyle information for medical researchers.

Gut microbiome has emerged as another rich source of information and as a possible new player contributing to the CAD/MI pathogenesis (82–84). It has long been known that bacteria activate inflammatory pathways, and recent data demonstrate that the gut microbiome may also affect lipid metabolism and influences the development of obesity and atherosclerosis (84), suggesting that gut microbiota could be used as a diagnostic marker for CAD (85). The most investigated is the association between gut microbiota and fasting plasma trimethylamine N-oxide (TMAO) levels, a gut microbiotadependent metabolite, previously also associated with CAD and stroke (81, 86). Org et al. (81) demonstrated that certain blood plasma metabolites strongly correlated with gut microbial community structure and that some of these correlations may be specific for the pre-diabetic state. LeChatelier et al. (84) used qunatitative gut microbiome information to distinguish between individuals with "high bacterial richness" and "low bacterial richness," were the latter were characterized by increased adiposity, insulin resistance and dyslipidemia in addition to a more pronounced inflammatory phenotype. Le Chatelier Fu et al. (84) and Fu et al. (87) reported that gut microbiota richness and diversity were negatively correlated with triglycerides and positively correlated with HDL levels, however this effect was independent of age, sex and host genetics. So far, genomewide mapping of the so-called microbiome quantitative trait loci (mbQTLs) (88) in the context of CAD has not been performed and is definitely next in line, ideally in conjunction with comprehensive profiling of metabolome in several tissues and body fluids in large enough cohorts.

# INTEGRATING GENETIC VARIATION AND MULTIPLE OMICS DATASETS

An integrative analysis of genetic variation and transcriptome with additional high-throughput measurements may greatly improve the predictive power of disease networks. Zhu et al. (89) However, the number of studies conducting multi-omics integrations in the context of CAD is limited so far. Miller et al. (90) integrated genetic variation with investigations of chromatin state, enhancer activity and TF binding in human coronary artery smooth muscle cells and demonstrated, for example, that one of the lead candidate variants, rs17293632, located within an intergenic region of the SMAD3 gene, overlaps an open chromatin region. Moreover, it was observed that the major risk C allele was more associated with open chromatin and resided in a canonical AP-1 motif, which was effectively destroyed by the minor protective T allele. Preferential AP-1 binding to the risk C allele was experimentally validated using allele-specific ChIP analyses. Miller et al. (90) and Kraus et al. (42) performed a pathway-level integrative analyses, linking genetics, epigenetics, transcriptomics, and metabolomics profiles and implicating the ubiquitin proteasome system in cardiovascular disease pathogenesis. This study observed associations of circulating short-chain dicarboxylacylcarnitine (SCDA) with variants in ER stress genes, whereof several genetic variants (**Table 1** and **Figure 8**) in FBXO25 and SUGT1 genes also demonstrated evidence of cis-regulation in expression quantitative trait loci (eQTL) analyses and independently predicted CAD events (42). Moreover, two other genes from the same ER stress pathway—BRSK2 and HOOK2—were identified as differentially methylated, when comparing individuals with high and low SCDA levels. Subsequently, experimental validation using culture of human kidney cells in the presence of levels of fatty acids found in individuals with cardiometabolic disease, demonstrated induced accumulation of SCDA metabolites in parallel with increases in the ER stress marker BiP (42).

Shu et al. (20) investigated shared genetic regulatory networks for CAD and type 2 diabetes (T2D) and their key intervening drivers in multiple populations of diverse ethnicities by performing an integrative analysis of five multi-ethnic GWAS for CAD and T2D, eQTLs, ENCODE, as well as tissuespecific gene network models (both co-expression and graphical models) from disease-relevant tissues. This study identified pathways regulating the metabolism of lipids, glucose and branched-chain amino acids, as well as pathways governing oxidation, extracellular matrix and immune response as shared pathogenic processes for both diseases and identified 15 key drivers including HMGCR, CAV1, IGF1, and PCOLCE, whose network neighbors collectively accounted for ∼35% of known GWAS hits for CAD and 22% for T2D (20). Laurila et al. (43) applied a combined approach using both QTLs and canonical pathway analysis to link genomics and transcriptome analysis from the subcutaneous adipose tissue and plasma HDL lipidomics profiling, highlighting change in HDL particle quality toward putatively more inflammatory and less atheroprotective phenotype in subjects with low HDL, due to their reduced antioxidative capacity. Within the HLA region, this study found two significant, dose-dependent cis-eQTL associations with low HDL and inflammatory pathways: rs241437 in the intron of TAP2 and rs9272143 between HLA-DRB1 and HLA-DQA1, the latter also being associated with down-regulation of antioxidative pathways in HDL particles (43).

The application of multi-omics integrations in the field of CAD has so far been limited (22). Obviously, one of the main reasons for this is the current lack of appropriate data in large enough cohorts. However, considering the great promise such studies hold for precision medicine, it is expected that parallel measurements on multiple omics layers will be rapidly collected during the next couple of years, allowing also a comprehensive comparison, validation and improvement of the existing computational integration methods.

### MITOCHONDRIAL GENETIC VARIATION AND DOWNSTREAM OMICS DATASETS

Dysfunction of mitochondria has been increasingly associated with obesity-related cardiometabolic diseases and CAD (91). Thus, genetic variation in the mitochondrial DNA (mtDNA), which codes for the 37 OXPHOS genes as well as further >1000 nuclear-coded genes imported into mitochondria constituting essential components for their proper functioning, needs exploration for a better understanding of CAD genetics. The mitochondrial haplogroup T (45) and mtDNA variants m.16189T>C (46) and m.15927G>A (47) have been associated with CAD in different ethnic groups. Another mitochondrial variant, m.8701A>G, has been associated with hypertension (44). This variant is located in MT-ATP6 (ATP synthase/complex V F0 subunit 6) gene, which is part of the ATP synthase enzyme, responsible for the final step of oxidative phosphorylation, and, on the functional level, using transmitochondrial hybrid cells (cybrids), it has been shown that it alters mitochondrial matrix pH and intracellular calcium dynamics (**Figure 9**) (92).

Similarly, other mitochondria-related omics data investigations could be of interest in the context of CAD, as Baccarelli et al. (93) reported that ATP synthesis genes including protein-encoding cytochrome c oxidase genes (MT-CO1, MT-CO2, and MT-CO3) and MT-TL1 were hypermethylated in platelets of CAD cases as compared to healthy controls (93). Using eQTLs in seven CAD relevant vascular and metabolic tissues (53) in conjunction with established CAD risk loci from GWAS (9) and time-resolved transcriptome data in the aortic arch in mice with reversible hypercholesterolemia (94, 95) we

recently demonstrated a massive down-regulation of nuclearencoded mitochondrial genes (96), specifically at the time of rapid atherosclerotic lesion expansion and foam cell formation, which was largely reversible by genetically lowering plasma cholesterol. Both mitochondrial signature genes were supported as causal for CAD in humans, as eQTLs representing their genes significantly overlapped with disease risk SNPs. In line with this, the STARNET (28) study recently examined mitochondrial (i.e., mtDNA-derived) gene expression and a markedly lower expression of mitochondrial genes in the atherosclerotic aortic arterial wall as compared to non-atherosclerotic arterial wall.

Furthermore, genetic variation of mitochondrial metabolome has remained largely unexplored. Hartiala et al. (41) searched for genetic factors associated with plasma betaine levels and determined their effect on CAD risk. This resulted in the identification of two significantly associated loci on chromosomes 2q34 and 5q14.1. The lead variant on 2q24 rs715—localized to carbamoyl-phosphate synthase 1 (CPS1), which encodes a mitochondrial enzyme that catalyzes the first committed reaction and rate-limiting step in the urea cycle. Rs715 was also significantly associated with decreased levels of urea cycle metabolites and increased plasma glycine levels. Finally, rs715 yielded a strikingly significant and protective association with decreased risk of CAD in women (41).

Finally, in recent years, it has become increasingly evident that the gut microbiome produces metabolites that influence mitochondrial function and biogenesis (97), hence the ancestral gut microbiome-mitochondrion connection and its relation to CAD might need to be explored in the near future, as well.

Resent progress in next-generation sequencing (NGS) techniques has set a scene for a second "gold rush" in mitochondrial genomics and mtDNAs are presently the most sequenced type of eukaryotic chromosome (98). At the same time, multi-omics investigations in mitochondria, mapping the genomes, transcriptomes, proteomes, and metabolomes in parallel, apart from yeast (99) have not been conducted yet. Hence, although, mitochondrial dysfunction has been associated with many human diseases, the respective proteins and pathways are not well-characterized (99), presenting an exciting future field of investigation, especially considering the fact that mitochondria play a key role in plasticity and adaptation to environmental change, including adaptation to physiological stress (100).

# CONCLUSIONS AND FUTURE DIRECTIONS

Given that CAD like other common complex disorders develops over time and involves both genetics and environment, full mechanistic insight will require coordinated sets of severalomics data at multiple time points, collected from many disease relevant tissues and body fluids in large enough cohorts (20, 21). Environmental risk factors can interact with the genome and perturb the epigenome to further modulate the transcriptome and proteome (20). Therefore, comprehensive monitoring and careful documentation of multiple environmental and lifestyle factors over time, i.e., the envirome, will be indispensable to yield significant insights into the complex etiology of CAD. Moreover, imaging and electronic health record data also will need to be considered. As more-omics and other data are generated, novel methods for efficient data integration, modeling, visualization and interpretation will be urgently needed to efficiently cope with this multi-dimensional data (101), and translate it into actionable precision medicine tools. Although, there has been major progresses in the development of multidimensional data integration algorithms and tools, the field is still in its infancy and the flexibility, effectiveness and robustness of data integration to extract biological insights is still restricted, especially when clinical outcomes (e.g., stable CAD vs. MI) need to be modeled (22, 101). In addition we still face a number of technical challenges related to patient sampling and profiling. For example, as already recognized by Hasin et al. and others (20, 21) human studies are often affected by various confounding factors, which are difficult or even impossible to control for (e.g., diet and medications). Clearly, also the available sample size will play an important role for the multi-omics approach to produce meaningful insights into CAD (21) and allow the generation of reliable prediction models for more efficient design of therapeutics, tailored to individual needs. According to Hasin et al. an underpowered study may not only miss true signals, but is also more likely to produce false positive results (21). Furthermore, already before and during data collection, careful attention has to be paid to data analysis requirements, e.g., sufficient depth of coverage for RNA-seq experiments (21).

### AUTHOR CONTRIBUTIONS

BV and HS drafted and edited the manuscript.

### FUNDING

This work was supported by grants from the Fondation Leducq [CADgenomics, 12CVD02], the German Federal Ministry

### REFERENCES


of Education and Research (BMBF) within the framework of ERA-NET on Cardiovascular Disease, Joint Transnational Call 2017 [ERA-CVD: grant JTC2017\_21-040], within the framework of target validation [BlockCAD: 16GW0198K], within the framework of the e:Med research and funding concept [AbCD-Net: grant 01ZX1706C and e:AtheroSysMed: grant 01ZX1313A-2014], and the European Union Seventh Framework Programme FP7/2007–2013, under grant agreement no. HEALTH-F2-2013-601456 (CVgenes-at-target). Further grants were received from the Deutsche Forschungsgemeinschaft (DFG) as part of the Sonderforschungsbereich CRC 1123 (B2).


from genome-wide association studies. Circulation (2017) 10:e001487. doi: 10.1161/CIRCGENETICS.116.001487


following thrombotic and non-thrombotic myocardial infarction. J Proteom. (2017) 160:38–46. doi: 10.1016/j.jprot.2017.03.014


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Vilne and Schunkert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Serum Biomarkers of Endothelial Dysfunction in Fabry Associated Cardiomyopathy

Jefferson Loso<sup>1</sup> , Natalie Lund<sup>1</sup> , Maxim Avanesov <sup>2</sup> , Nicole Muschol <sup>3</sup> , Susanne Lezius <sup>4</sup> , Kathrin Cordts 5,6, Edzard Schwedhelm5,6 and Monica Patten1,6 \*

<sup>1</sup> Department of General and Interventional Cardiology, University Heart Center Hamburg, Hamburg, Germany, <sup>2</sup> Department of Diagnostic and Interventional Radiology, University Medical Center Hamburg- Eppendorf, Hamburg, Germany, <sup>3</sup> Department of Pediatrics, University Medical Center Hamburg- Eppendorf, Hamburg, Germany, <sup>4</sup> Department of Medical Biometry and Epidemiology, University Medical Center Hamburg- Eppendorf, Hamburg, Germany, <sup>5</sup> Department of Experimental Pharmacology and Toxicology, University Medical Center Hamburg- Eppendorf, Hamburg, Germany, <sup>6</sup> DZHK (German Center for Cardiovascular Research e.V.), Hamburg, Germany

Background: Fabry disease (FD) is characterized by early development of vasculopathy and endothelial dysfunction. However, it is unclear whether these findings also play a pivotal role in cardiac manifestation. As Fabry cardiomyopathy (FC) is the leading cause of death in FD, we aimed to gather a better insight in pathological mechanisms of the disease.

### Edited by:

Tanja Zeller, Universität Hamburg, Germany

### Reviewed by:

Nazareno Paolocci, Johns Hopkins University, United States Alexander Pott, Universitätsklinikum Ulm, Germany

> \*Correspondence: Monica Patten patten@uke.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 20 February 2018 Accepted: 17 July 2018 Published: 15 August 2018

### Citation:

Loso J, Lund N, Avanesov M, Muschol N, Lezius S, Cordts K, Schwedhelm E and Patten M (2018) Serum Biomarkers of Endothelial Dysfunction in Fabry Associated Cardiomyopathy. Front. Cardiovasc. Med. 5:108. doi: 10.3389/fcvm.2018.00108 Methods: Serum samples were obtained from 17 healthy controls, 15 FD patients with and 7 without FC. FC was defined by LV wall thickening of >12 mm in cardiac magnetic resonance imaging and serum level of proBNP, high sensitive Troponin T (hsT), and globotriaosylsphingosine (lyso-GB3) were obtained. A multiplex ELISA-Assay for 23 different angiogenesis markers was performed in pooled samples. Markers showing significant differences among groups were further analyzed in single samples using specific Elisa antibody assays. L-homoarginine (hArg), L-arginine, asymmetric (ADMA), and symmetric Dimethylarginine (SDMA) were quantified by liquid chromatography—mass spectrometry.

Results: Angiostatin and matrix metalloproteinase 9 (MMP-9) were elevated in FD patients compared to controls independently of the presence of FC (angiostatin: 98 ± 25 vs. 75 ± 15 ng/mL; p = 0.001; MMP-9: 8.0 ± 3.4 vs. 5.0 ± 2.4µg/mL; p = 0.002). SDMA concentrations were highest in patients with FC (0.90 ± 0.64 µmol/l) compared to patients without (0.57 ± 0.10 µmol/l; p = 0.027) and vs. controls (0.58 ± 0.12 µmol/l; p = 0.006) and was positively correlated with indexed LV-mass (r = 0.61; p = 0.003), hsT (r = 0.56, p = 0.008), and lyso-Gb3 (r = 0.53, p = 0.013). Accordingly, the ratio of L-homoarginine to SDMA (hArg/SDMA) was lowest in patients with FC (2.63 ± 1.78) compared to controls (4.16 ± 1.44; p = 0.005). For L-arginine, hArg and ADMA no significant differences among groups could be detected, although a trend toward higher ADMA and lower hArg levels could be observed in the FC group. Furthermore, a significant relationship between kidney and cardiac function could be revealed (p = 0.045).

Conclusion: Elevated MMP-9 and angiostatin levels suggest an increased extracellular matrix turnover in FD patients. Furthermore, endothelial dysfunction may also be involved in FC, as SDMA and hArg/SDMA are altered in these patients.

Keywords: matrix metalloproteinase 9, angiostatins, SDMA, homoarginine, Fabry disease, Fabry cardiomyopathy, vasculopathy, endothelial dysfunction

### INTRODUCTION

Fabry disease (FD) is an X-linked recessive multi-systemic storage disorder caused by a decreased activity of the lysosomal enzyme alpha-Galactosidase A (GLA) (1, 2). Due to the deposition of globotriaosylceramide (Gb3) in vascular lysosomes neutral glycosphingolipids accumulate in plasma and various tissues throughout the body (3). Typical manifestations of FD are cardiac, neurological, renal, ocular, dermatological, and gastrointestinal (4–6) with cardiovascular disease as the leading cause of death in FD patients (7). Prognosis of cardiomyopathy is particularly poor (8), thus, appropriate diagnosis and treatment of Fabry associated cardiomyopathy (FC) is crucial. Enzyme Replacement Therapy (ERT) has proven to significantly reduce accumulation of Gb3, especially intracellular deposits in the coronary endothelium (9, 10) and to halt or even partially reverse FC. However, in advanced stages of FD with a severe cardiac phenotype the effectiveness of ERT is profoundly diminished and the disease can even progress (11–13). Therefore, a better understanding of the underlying mechanisms contributing to the development of FC is urgently needed to improve treatment and outcome of FD patients.

Clinical studies provide evidence of increased intima-media thickness (IMT) and impaired artery flow-mediated dilatation in FD (14, 15) indicating an early onset of atherosclerosis in these patients. Moreover, different studies suggest that myocardial fibrosis, detected by cardiac magnetic resonance imaging (cMRI), may contribute to left ventricular remodeling in FD (16, 17). Myocardial fibrosis and subsequent remodeling are caused by an altered extracellular matrix turnover, which is catalyzed by Matrix Metalloproteinases (MMP) including MMP-9 (18). This is in line with the detection of increased serum MMP-9 level in Fabry patients compared to controls (19). Matsunaga et al showed that the inhibition of NO synthase resulted in increased MMP-9 and MMP-2 activities suggesting a link between oxidative stress and extracellular matrix turnover. Several clinical and experimental studies demonstrated inflammatory activity and endothelial nitric oxide synthase (eNOS) alterations in vascular cells of FD patients (20– 23). These findings support the hypothesis of early occurring vasculopathy and endothelial dysfunction in FD. Whether these findings play a pivotal role in cardiac manifestation has not yet been sufficiently investigated. Accordingly, the aim of this study was to gather an insight in underlying pathological mechanisms by determining serum markers of endothelial dysfunction, angiogenesis and cardiac function in FD patients with and without FC.

### METHODS

### Study Population

Serum samples from 15 FD patients with FC, 7 without FC, and 17 healthy controls were collected between September 2014 and December 2016. FD was confirmed by molecular genetic analysis revealing the following mutations: 4x p.N215S, 2x p.E341K, 2x c.1277\_1278delAA, 2x c.718\_719delAA, p.N320l, p.A143T, p.A230\_I232del, p.Q327L, p.A389V, c.717delAA, p.I384N, p.P205T, p.S247P, p.Q327L, p.R227Q. FC was defined by LV wall thickening of >12 mm assessed in cMRI. Furthermore, 11 FD patients with FC and one patient without FC received ERT at the time of blood sampling. The study was in line with the principles outlined in the Declaration of Helsinki and approved by the local ethics committee. All participants gave their written informed consent for participation in the study.

### Enzyme-Linked Immunosorbent Assays

Blood samples were centrifuged at 4000 × G for 10 min at room temperature and obtained serum was aliquoted and stored at −80◦C until use. For the multiplex Enzyme-Linked Immunosorbent Assay (ELISA) a Human Angiogenesis Antibody Array Membrane (Abcam PLC, Cambridge, UK, ab169808) was used. Aliquots of serum samples were pooled into three groups: FD with FC (n = 15), FD without FC (n = 7), and controls (n = 17). The assay allowed a simultaneous and semiquantitative analysis of 23 targets (Angiopoietin 1, Angiopoietin 2, Angiostatin, Endostatin, G-CSF, GM-CSF, I-309, IL-10, IL-1 alpha, IL-1 beta, IL-2, IL-4, I-TAC, MCP-3, MCP-4, MMP-1, MMP-9, PECAM-1, Tie-2, TNF alpha, suPAR, VEGFR2, VEGFR3) and was performed in duplicates. Visualization of membrane signals was performed by ChemiDocTM MP Imaging System and the densitometry software Image LabTM.

Markers showing significant differences among groups in the multiplex Elisa-Assay were further analyzed in single samples. Specific ELISAs from Abcam were used for MMP-9, angiostatin, soluble urokinase-type plasminogen activator receptor (suPAR), and vascular endothelial growth factor (VEGF) quantification according to the manufacturers' instructions.

### Liquid Chromatography—Tandem Mass Spectrometry (LC-MS/MS) Measurements

L-Arginine, L-homoarginine (hArg), asymmetric (ADMA), and symmetric Dimethylarginine (SDMA) were quantified as described previously (24, 25). In brief, 25 µL of EDTA plasma were diluted with 100 µL <sup>2</sup>H7-arginine, <sup>13</sup>C 15 <sup>7</sup> N4 hArg, and <sup>2</sup>H7-ADMA solved in methanol. Proteins were precipitated and residues were derivatised to their butylester derivatives. Twenty microliter of reconstituted samples were injected into the 1200 L Triple Quadrupole MS/MS system (Agilent Technologies, Waldbronn, Germany) chromatography. Analytes were separated on a Polaris C18-Ether column (Agilent Technologies; 50 × 2.0 mm) using an elution gradient of the two mobile phases (A): 1 mL/L formic acid in water and (B) acetonitrile-methanol (50/50, vol/vol) containing 1 mL/L formic acid in water (0:00 min 95/5 (A/B) – 0:30 95/5 – 2:00 50/50 – 2:01 95/5 – 4:00 95/5). The flow rate was 0.3 mL/min. Peak area ratios were calculated with internal standards and external calibration curves prepared in dialysed EDTA plasma. Intra- and interassay coefficients of variation were below 15 % for all analyses.

## Cardiac MRI

Cardiac MRI was performed with a 1.5 Tesla MRI scanner (Achieva, Philips Medical Systems, Philips, Best, The Netherlands). The examination contained a retrospectively gated cine-MRI in cardiac short and long axis orientations using a steady-state free precession (SSFP) sequence to quantify regional and global left ventricular (LV) function and the LV-myocardial mass. Ten minutes after bolus injection of 0.075 mmol/kg Gd-BOPTA (MultiHance <sup>R</sup> ), end-diastolic late gadolinium enhanced (LGE) images were acquired by phase-sensitive inversion recovery (PSIR) sequences to quantify areas of myocardial fibrosis. LGE images were obtained in the LV short-axis as well as in two-, three-, and four-chamber views and quantified using cvi42 <sup>R</sup> software (Circle Cardiovascular Imaging Inc., Calgary, Alberta, Canada). All quantified parameters included LV function, end-diastolic and end-systolic volumes, stroke volume, ejection fraction, left ventricular mass, regional wall thickness, and LGE. Left ventricular mass was indexed to the body surface area calculated by DuBois & DuBois formula.

# Statistical Analysis

Obtained data were analyzed using SPSS, version 23. Not normally distributed variables were log transformed if necessary. Assessment of group differences was performed by analysis of variance (ANOVA) or analysis of covariance (ANCOVA) when additionally adjusted for age, sex and eGFR. Post-hoc group comparisons were analyzed if the global differences were significant. Correlations were investigated by Pearson's correlation tests. Logistic regression was performed to investigate the relationship between renal and cardiac function. Concentrations are presented as mean ± standard deviation and a p-value of <0.05 was set as statistically significant.

# RESULTS

### Baseline Characteristics

Cardiac MRI measurements revealed significantly higher indexed left-ventricular masses of FD patients with FC compared to controls and FD patients without FC as shown in **Table 1**. Furthermore, similar results could be shown for septal thickness and mean LGE size, which were significantly thicker/higher in FD patients with FC. No differences among groups could be detected for left ventricular ejection fraction and stroke volume. Laboratory values revealed significantly higher levels of proBNP, hsT, and lyso-Gb3 in the cardiomyopathy group. Moreover, patients with FC showed elevated renal function parameters.

## Quantification of Angiogenesis Markers by Enzyme-Linked Immunosorbent Assays

The multiplex ELISA indicated differences among the pooled groups for angiostatin, MMP-9 and suPAR. In **Figure 1** the MMP-9 and angiostatin levels from the ensuing ELISA assays are demonstrated. MMP-9 concentrations were 1.51 times higher in FD patients with FC and 2.07 times higher in FD patients without FC compared to controls. The mean MMP-9 level of all FD patients was 1.67 times higher compared to controls (**Table 2**). Accordingly, angiostatin ELISA provided 1.33 times higher angiostatin levels in FD patients with FC and 1.23 times higher concentrations in FD patients without FC compared to controls. The mean angiostatin level of all FD patients was 1.3 times higher compared to controls (**Table 2**). However, no significant differences between the two FD groups could be detected for angiostatin and MMP-9. Moreover, neither suPAR nor VEGF concentrations revealed a significant difference between FD patients and healthy controls.

## Quantification of Endothelial Dysfunction Markers by LC-MS/MS Measurements

LC-MS/MS measurements revealed 1.4 times higher SDMA concentrations in FD patients with vs. FD patients without FC and equally 1.4 times higher concentrations compared to controls, whereas no statistically significant difference could be shown between FD patients without FC and controls (**Table 2**, **Figure 2**). Accordingly, the ratio of L-homoarginine to SDMA (hArg/SDMA) was 0.53 times lower in FD patients with FC compared to controls. For L-arginine, hArg, ADMA and the ratio of hArg/ADMA no significant differences among groups could be detected, although a trend toward higher ADMA concentrations, lower hArg levels and accordingly a lower ratio of hArg/ADMA could be observed in the FD group with existing FC (**Table 2**).

# Correlations of Biomarkers to Anthropometric and Clinical Phenotypes

No statistically relevant age or sex dependencies for MMP-9, angiostatin, SDMA, or hArg/SDMA could be observed. As shown in **Figure 3A**, SDMA was positively correlated with LV-mass, lyso-Gb3, and hsT and negatively correlated with eGFR. Accordingly, as shown in **Figure 3B**, hArg/SDMA showed a negative correlation to LV-mass, lyso-Gb3, and a positive correlation with eGFR. Moreover, correlations of indexed LVmass with ADMA, hArg/ADMA, suPAR, lyso-Gb3, cardiac, and renal parameters could be revealed (**Table 3**). For MMP-9 and angiostatin no significant correlations to any of these variables could be observed (data not shown).

# Relationship of SDMA and HArg/SDMA to Renal Function

Analysis of covariance with adjustment for eGFR diminished the significant group differences. Therefore, SDMA and hArg/SDMA



Data are presented as mean ± standard deviation. Significant p-values are marked in bold. Results of the analysis of variance (ANOVA) are shown. If ANOVA was significant post-hoc group comparisons were performed. ACR, albumin-to-creatinine ratio; cMRI, cardiac magnetic resonance imaging; eGFR, estimated glomerular filtration rate using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula; ERT, enzyme replacement therapy; Fabry no FC, Fabry patients without Fabry cardiomyopathy; hsT, high sensitive cardiac Troponin T; LGE, late gadolinium enhancement; LGE size LV mean (5th SD), 5th standard deviation of left ventricular mean late gadolinium enhancement size; LVEF, left ventricular ejection fraction; LVM indexed to BSA, left ventricular end-diastolic mass indexed to body surface area; lyso-Gb3, Globotriaosylsphingosine; MSSI, Mainz severity score index; proBNP, prohormone of brain natriuretic peptide; SV, stroke volume.

FIGURE 1 | Box-plots of MMP9 (A) and angiostatin (B) concentration in 17 healthy controls, 7 FD patients without and 15 FD patients with FC. Box plots represent median, 25th and 75th percentiles. Whiskers indicate minimum and maximum without outliers and extremes. On the Y-axis concentrations of the biomarkers are presented (log scale). Brackets indicate p-values from the analysis of variance (ANOVA) with post-hoc group comparisons. MMP9, matrix metalloproteinase 9; n.s., not significant.

are probably dependent on kidney function. Furthermore, an ensuing logistic regression of the eGFR and the two FD groups with and without FC was performed to evaluate the relationship between renal (eGFR) and cardiac function (existence of a FC) in FD patients. This test showed a significant relationship (p = 0.045). This finding is supported by the correlation of renal parameters with the LV-mass (**Table 3**).

# DISCUSSION

The aim of this study was to investigate the role of vasculopathy and endothelial dysfunction in FD with special respect to Fabry associated cardiomyopathy. In blood samples of FD patients generally higher levels of MMP-9 and angiostatin could be detected independently of an existing FC, supporting the


### TABLE 2 | Markers of endothelial dysfunction.

Data are presented as mean ± standard deviation. Significant p-values are marked in bold. Results of the analysis of variance (ANOVA) are shown. If ANOVA was significant post-hoc group comparisons were performed. ADMA, asymmetric dimethylarginine; FD total, both Fabry disease groups combined; hArg, L-homoarginine; MMP-9, matrix metalloproteinase 9; SDMA, symmetric dimethylarginine; suPAR, soluble urokinase-type plasminogen activator receptor; VEGF, vascular endothelial growth factor. \*Mean ± standard deviation of both FD groups combined (2 + 3) \*\*p-value of t-test: 2 + 3 vs. 1.

FIGURE 2 | Box-plots of SDMA concentrations (A) and the ratio of L-homoarginine and SDMA (B) in 17 healthy controls, 7 Fabry patients without and 15 Fabry patients with FC. Box plots represent median, 25th and 75th percentiles. Whiskers indicate minimum and maximum without outliers and extremes. On the Y-axis concentrations of the biomarkers are presented (log scale). Brackets indicate p-value from the analysis of variance (ANOVA) with post-hoc group comparisons. n.s. , not significant; SDMA, symmetric dimethylarginine.

hypothesis of an altered extracellular matrix (ECM) turnover in FD.

In patients with FC higher concentrations of SDMA and a decreased ratio of hArg/SDMA could be revealed compared to healthy controls as well as to FD patients without overt cardiomyopathy. These parameters correlate with the ventricular mass as well as with cardiac and renal markers suggesting a potential causal relationship of kidney function and cardiac disease progress.

### MMP-9, Angiostatin, and SuPAR

MMP-9 is part of a family of endogenous zinc-dependent endopeptidases. In the myocardium MMPs play an important role for structural integrity of the ECM (18). In patients with familial hypertrophic cardiomyopathy (HCM) an association of MMP-9 with gadolinium enhancement in cardiac MRI was recently described and an important role of the MMP system in cardiac remodeling and fibrosis was proposed (26). In this context, Shah et al. identified significantly higher levels of MMP-9 in 29 FD patients compared to 21 healthy controls and hypothesized that MMP-9 plays an important role in the pathogenesis of FC and might be a valuable surrogate marker for the response to ERT (19). The findings from our study confirm the higher levels of MMP-9 in FD patients. However, a correlation of MMP-9 levels and Fabry associated cardiomyopathy cannot be confirmed.

MMP-9 cleaves matrix-bound plasminogen into angiostatin (27) which is a potent inhibitor of angiogenesis and has been shown to attenuate endothelial cell proliferation and migration (28). In FD the role of angiostatin has not yet been investigated. According to the elevated MMP-9 also Angiostatin concentrations increased in FD compared to controls and

showed a trend toward higher concentrations in the FC group compared to FD patients without FC.

Interestingly, a study from Matsunaga et al. could demonstrate that reduced NO production leads to an increase of MMP-2 and MMP-9 activity and higher angiostatin concentrations concluding that NO production influences coronary angiogenesis (29). Furthermore, an experimental study from Takahashi et al. revealed that angiostatin inhibits VEGF-stimulated NO production in human umbilical vein endothelial cells (30). In this context higher MMP-9 and angiostatin levels may contribute to an alteration of NO synthesis in FD. Moreover, an experimental animal study from Givvimani et al. found a switch to higher levels



Results from the correlation tests of laboratory values and markers of endothelial dysfunction correlating with indexed left ventricular mass. ACR, albumin-to-creatinine ratio; ADMA, asymmetric dimethylarginine; eGFR, estimated glomerular filtration rate using the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula; hArg, Lhomoarginine; hsT, high sensitive cardiac Troponin T; lyso-Gb3, Globotriaosylsphingosine; proBNP, prohormone of brain natriuretic peptide; r, Pearson's correlation coefficient; SDMA, symmetric dimethylarginine; suPAR, soluble urokinase-type plasminogen activator receptor.

of MMP-9 and anti-angiogenic markers such as angiostatin in the transition from compensatory hypertrophy to decompensated heart failure (31). Both, the effect of MMP-9 and angiostatin on NO synthesis and on a possible transition to decompensated heart failure in FC should be addressed in future studies.

Soluble uPAR is an important regulator of ECM proteolysis and is also involved in MMP activation by plasmin generation (32). For suPAR only a trend toward higher concentrations in FC could be determined, however, a positive correlation with the LV-mass could be found, as shown in **Table 3**. Soluble uPAR was shown to directly correlate with proteinuria (33). In this context a strong correlation to renal parameters and especially proteinuria (p < 0.001, r = 0.74) could be revealed.

# Endothelial Dysfunction in Fabry Associated Cardiomyopathy

ADMA, SDMA, and hArg are non-proteinogenic amino acids structurally related to L-arginine. hArg has been shown to serve as an alternative substrate for NOS and to inhibit arginase. Thus, it is considered to increase NO formation (34, 35). In addition, low circulating concentrations of hArg have been proposed as a cardiovascular disease risk factor (36). ADMA on the other hand is an endogenous inhibitor of NOS (37), whereas its structural isomer SDMA does not directly interfere with NOS (38). However, SDMA inhibits the tubular L-arginine absorption in kidneys (39) and the y <sup>+</sup> transporter, which mediates the intracellular uptake of L-arginine (40). Therefore, SDMA has an indirect effect on NOS. Both dimethylarginines, ADMA and SDMA, are involved in endothelial dysfunction (41, 42), oxidative stress (43, 44), and atherosclerosis (45). A recent meta-analysis of prospective studies from Schlesinger et al. concluded that both markers are independently associated with cardiovascular disease and all-cause mortality (46).

In this context, the present study reveals a correlation of hArg, ADMA, and the ratio hArg/ADMA with the LV mass. Although group differences of ADMA and hArg showed a trend toward altered concentrations in FC patients, the effect was not significant. Moreover, the ratio of hArg/ADMA was superior compared to the single markers, but it equally missed significance in overall F-test of the analysis of variance with a p-value of 0.06.

However, in our cohort higher SDMA level in FC patients correlated with LV-mass, hsT, and lyso-Gb3 concentrations indicating the presence of endothelial dysfunction in these patients. Furthermore, this reveals a new possible mechanism of NO alteration in FD as this is the first study demonstrating higher SDMA level in FC patients' sera. Similar results are shown for the hArg/SDMA ratio. Although the alteration of the hArg/SDMA ratio was statistically not superior compared to the higher SDMA concentrations in FC, group differences of this ratio show that hArg might also contribute to a dysfunction of NO synthesis in FC. Due to the antagonistic effects of hArg and SDMA in arginine metabolism a further investigation of this ratio might be reasonable.

The discrepancy between significantly higher SDMA levels without ADMA group differences might be explained by their disparate ways of excretion. SDMA elimination is exclusively renal, whereas ADMA is also enzymatically excreted by the dimethylarginine dimethylaminohydrolases (DDAHs). DDAH-1 for example is highly expressed in the kidney and liver (47) and as FD patients do not typically show liver dysfunction enzymatic ADMA elimination might be sufficient, whereas SDMA accumulates due to the typical early occurring renal insufficiency in FD. In this regard, adjustment for eGFR showed a significant dependency of SDMA and hArg/SDMA to kidney function. Furthermore, an association of eGFR and FC could be revealed. It is well known that diastolic dysfunction caused by impaired LV relaxation may lead to congestive heart failure and consequently to renal insufficiency. In our cohort only 4 patients with FC presented with mild (grade 1) and 1 patient with moderate (grad 2) diastolic dysfunction. Thus, based on these findings one might speculate that renal insufficiency might rather contribute to FC due to an accumulation of SDMA and its negative effect on NO synthesis in the vascular system. However, a causal relationship between renal and cardiac function in Fabry disease has to be proven in further studies.

# CONCLUSIONS

This study provides evidence for an altered ECM turnover with higher levels of MMP-9 and angiostatin in FD patients independent of an existing FC. Moreover, patients with FC showed higher SDMA and hArg/SDMA level, which correlated with LV mass, hsT, and lyso-Gb3 concentration but also with impaired renal function. Renal and cardiac function showed a relationship leading to the hypothesis that accumulation of SDMA due to renal insufficiency in FD might contribute to the development of endothelial dysfunction and subsequently lead to FC.

### Limitations

A major limitation of this study is the small sample size of 22 investigated FD patients with only 7 patients without FC compared to more than twice as much patients with FC. Another limitation is the unequal distribution of male and female patients in the FD groups and the fact that this study is a single center study. Moreover, whether the relationship between kidney and cardiac function is causal cannot be proven by this study design.

### Perspectives

To validate these findings multicenter studies including more patients to investigate the effects of MMP-9 and angiostatin on endothelial dysfunction in FD and to clarify the pathological impact of SDMA accumulation in Fabry associated cardiomyopathy are required.

## ETHICS STATEMENT

This study was carried out in accordance with the principles outlined in the Declaration of Helsinki. The

# REFERENCES


protocol was approved by the local ethics committee of the Ärztekammer Hamburg, Germany (approval no: PV4056).

# AUTHOR CONTRIBUTIONS

JL: writing of the manuscript, experimental work. NL: experimental work, writing assistance. MA: MRI data acquisition, critical review of the data. NM: acquisition of clinical data, critical manuscript review. SL: statistical analysis. KC and ES: supervision and design of experiments, critical review of data analysis and manuscript. MP: experimental design, supervision of data analysis. writing of the manuscript.

# FUNDING

This project was funded by a research donation from Shire Deutschland GmbH to the University Medical Center Hamburg-Eppendorf.

evidence for disease progression towards serious complications. J Intern Med. (2013) 274:331–41. doi: 10.1111/joim.12077


deficient in alpha-galactosidase A. Am J Physiol Gastrointest Liver Physiol. (2014) 306:G140–6. doi: 10.1152/ajpgi.00185.2013


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The handling Editor declared a shared affiliation, though no other collaboration, with authors JL, NL, MP.

Copyright © 2018 Loso, Lund, Avanesov, Muschol, Lezius, Cordts, Schwedhelm and Patten. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Exploring Coronary Artery Disease GWAs Targets With Functional Links to Immunometabolism

Maria F. Hughes 1,2,3,4 \*, Yvonne M. Lenighan1,4, Catherine Godson1,5 and Helen M. Roche1,2,4

<sup>1</sup> UCD Diabetes Complications Research Centre, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland, <sup>2</sup> Nutrigenomics Research Group, UCD Institute of Food and Health, School of Public Health Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland, <sup>3</sup> Centre of Excellence for Public Health, Queen's University Belfast, Belfast, United Kingdom, <sup>4</sup> UCD Institute of Food and Health, School of Public Health Physiotherapy and Sports Science, University College Dublin, Dublin, Ireland, <sup>5</sup> School of Medicine, University College Dublin, Dublin, Ireland

### Edited by:

Jeanette Erdmann, Universität zu Lübeck, Germany

### Reviewed by:

Wolfgang Lieb, Christian-Albrechts-Universität zu Kiel, Germany Clint L. Miller, University of Virginia, United States Loreto Munoz Venegas, Universität zu Lübeck, Germany

> \*Correspondence: Maria F. Hughes maria.hughes@ucd.ie

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 07 May 2018 Accepted: 01 October 2018 Published: 06 November 2018

### Citation:

Hughes MF, Lenighan YM, Godson C and Roche HM (2018) Exploring Coronary Artery Disease GWAs Targets With Functional Links to Immunometabolism. Front. Cardiovasc. Med. 5:148. doi: 10.3389/fcvm.2018.00148 Finding genetic variants that cause functional disruption or regulatory change among the many implicated GWAs variants remains a key challenge to translating the findings from GWAs to therapeutic treatments. Defining the causal mechanisms behind the variants require functional screening experiments that can be complex and costly. Prioritizing variants for functional characterization using techniques that capture important functional and regulatory elements can assist this. The genetic architecture of complex traits such as cardiovascular disease and type II diabetes comprise an enormously large number of variants of small effect contributing to heritability and spread throughout the genome. This makes it difficult to distinguish which variants or core genes are most relevant for prioritization and how they contribute to the regulatory networks that become dysregulated leading to disease. Despite these challenges, recent GWAs for CAD prioritized genes associated with lipid metabolism, coagulation and adhesion along with novel signals related to innate immunity, adipose tissue and, vascular function as important core drivers of risk. We focus on three examples of novel signals associated with CAD which affect risk through missense or UTR mutations indicating their potential for therapeutic modification. These variants play roles in adipose tissue function vascular function and innate immunity which form the cornerstones of immuno-metabolism. In addition we have explored the putative, but potentially important interactions between the environment, specifically food and nutrition, with respect to key processes.

Keywords: GWAS, immuno-metabolism, coronary artery disease, nutrition, omnigenic

# INTRODUCTION

CVD is increasing and the distribution of risk factors is changing with increasing prevalence of obesity and type II diabetes (T2D), particularly among young adults (aged 18–45) in developed countries (1–3). The burden of CVD risk factors remains very high because of unhealthy contemporary lifestyles, with dysregulated balance between energy intake and physical activity. In addition, malnutrition, wherein excess energy is coupled by micronutrient deficiencies, amplifies genetic risk (4). The major cardiovascular consequences of obesity and T2D predominantly derive from dysregulated and inflamed adipose tissue, particularly perivascular or visceral fat surrounding the organs (5). Visceral fat has limited expandability and becomes inflamed, with the resulting adipokine dysregulation adversely affecting vascular biology by promoting vasoconstriction, medial smooth muscle cell proliferation and endothelial dysfunction and is known as dysregulated immunometabolism (6). In the obese state, immune cells become activated and infiltrate metabolic tissues, chronic activation of inflammatory pathways in both vascular and immune components trigger stress kinase activation that impinge on the signaling of metabolic hormones such as insulin leading to impaired glucose and lipid homeostasis (7). Highly structured interactions between immune and metabolic responses are evolutionarily conserved and disruption of these interactions underlie many pathologies such as obesity and diabetes. Therapeutic solutions to tackle obesity, T2D and hypertension are drastically needed to reduce the overall burden of cardiovascular disease. However, many drugs or interventions have failed due to a lack of understanding of complex disease architecture (8, 9).

GWAs provided unique insights into the genetic architecture of complex diseases. Genetic architecture considers the overall composition of variants influencing a trait in terms of number, frequency and magnitude of effect and potential interactions, and can vary over traits (10). With increasing size and scope of GWAs, it has become clear that many complex traits are driven by enormously large numbers of variants of small effects. These variants are spread across the genome rather than in disease related pathways, include many without obvious connection to disease and/or related risk factors. These variants are potentially capturing most regulatory variants that are active in disease relevant tissues and the regulatory networks they form, may be so interconnected they affect the functions of core disease-related genes. This can be observed for variants that are heavily concentrated in regions that are transcribed or marked by active chromatin in disease-relevant tissues but with little enrichment for cell-type specific regulatory elements compared to broadly active regions. Boyle et al. (11) proposed that this pattern could be explained through an Omnigenic model of inheritance. This is an extension of RA Fischer's infinitesimal model of inheritance proposed nearly a century ago (12). The Omnigenic model considers that gene regulatory networks are so interconnected that all genes expressed in disease relevant cells are able to affect the functions of core diseaserelated genes. Most heritability is accounted for by effects of genes in peripheral pathways, outside of core pathways, which accounts for loci associated with multiple traits (pleiotrophy). Therefore, disease risk may be largely driven by genes with no direct relevance to disease and is propagated through regulatory networks to a much smaller number of core genes with direct effects.

In theory, the set of core genes must have a more pronounced effect on disease traits and proteins derived from these genes will drive pharmaceutical development and therapeutic strategies. However, how this works at the cellular regulatory network is incompletely understood. To understand the relevance of the variants for therapeutic development, it is crucial to understand their effect on protein level, activity or function. Even if a variant has a small effect on protein level and disease risk this protein may still be a suitable target for disease prevention if considered in context with its disease architecture (10). For instance, the SNP associated with (HMG-CoA reductase) explains 0.26% of variance in LDL levels, manipulating this gene can reduce LDL levels by 30–40% and reduce CAD risk (13). Even if the variants affect protein level or function, there are numerous challenges to drug development.

# AN OMNIGENIC ARCHITECTURE FOR CARDIOVASCULAR DISEASE?

GWAs has reproducibly associated over 160 variants with cardiovascular disease (14–20). By combining data from UK Biobank (34,541 cases and 261,984 non-cases) followed by replication in CardiogramplusC4D (88,192 cases and 162,544 controls), an additional 64 novel loci were recently prioritized (20). This identified a total of 163 loci associated with coronary artery disease (CAD) (21). Consistent with the omnigenic model for CAD genetic architecture, many novel candidate genes did not have an obvious connection to CAD and the genetic contribution was concentrated in regions transcribed or marked by active chromatin in relevant tissues (blood vessels and liver) but with little enrichment for cell-type specific regulatory elements. While they reconstituted a larger number of gene pathways/networks for CAD, increasing from 4 to 14%, overall the variants were spread throughout the genome and only 14% forming disease relevant pathways.

# PRIORITIZING VARIANTS USING INFORMATION FROM FUNCTIONAL AND REGULATORY REGIONS

To prioritize core CAD-related genes, they fine mapped the regions characterizing the functional, cellular and regulatory contribution of the variants (22) and prioritized their significance using probabilistic models (23) to derive a set of genes with converging evidence of potential functional SNP-gene mechanisms for functional follow up studies. The fine mapping methods that they used are compared against related methods summarized in **Table 1**, reviewed in Schaid et al. (30). Integrating information from multiple omics approaches in this way provides a more comprehensive understanding of the flow of information from the disease driver to its functional consequence or interactions. Methods can now test the mediating mechanism of these genetic variants on complex traits (31). Their analysis prioritized 161 variants to candidate genes based on proximity, expression quantitative loci data, DEPICT analysis and long-range chromatin interactions of variants with gene promotors for signals of regulation using stringent conditions and identified 28 loci with convincing arguments for causal variation, 22 known and 6 novel or 19 potential core genes (with missense, intergenic, downstream or UTR mutations). Among known genes; APOE, PCSK9, ANGPTL4 and SORT1 are implicated as core genes in lipid metabolism (a key component of immunometabolism) and targeting the effects of these genes can reduce CAD risk (32–34). Of the 6

### TABLE 1 | Summary of methods for fine mapping variants from GWAs.


novel signals, 3 are intergenic while 3 affect change through missense mutation or occur in a UTR3 region; these are TRIM5, FNDC3B, and CCM2 which are implicated in innate immunity, adipogenesis and vascular function, respectively, and all require functional follow up (**Figure 1**). Their study aimed to prioritize the CAD associations and elucidate regulatory connections that may influence the mechanism behind the associations, but according to the omnigenic model, broader regulatory connections between core genes must exist but are difficult to elucidate.

Several major challenges stand in the way to understanding how GWAs associations could become therapeutic targets. Most GWAs associations lie within non-coding regions making it difficult to predict their functions and identify targets/genes. Loci can be linked to multiple genes and the likely causal variant requires detailed investigation to elucidate the underlying mechanism. Functional follow up of important GWAs candidate loci now shows that multiple variants of small effect can synergistically drive dysfunction in regulatory networks, for example risk related to FTO (35), ANGPTL4 (17), GUCY1A3 (36), and SHROOM3 (37). To understand the mechanistic basis of increased adiposity associated with FTO, layers of OMICS data connecting epigenetic, gene co-expression and regulator expression followed by validation with genome editing elucidated the risk variant rs1421085 causes a loss of repression in AR1D5B which enhances expression of IRX3 and IRX5 increasing fat storage (35). Mining available OMIC data to gain insights into the complex regulatory circuitry behind these association signals has the potential to speed up functional follow-up by identifying novel links. We consider the three novel signals highlighted by van der Harst and Verweij for their strength of evidence and their importance to these pathways contributing to CAD risk or related traits such as adiposity and how these signals fit with other evidence supporting their contribution to disease risk. These may represent core genes but they may be signals that are context or cell specific to CAD. We also consider what the cell or tissue derived signals could offer therapeutically if they validated in independent studies. To this end, we explore a few examples wherein this paradigm may be relevant.

# TRIM5, INNATE IMMUNE SIGNALING AND CAD RISK

The variant rs11601507 causes a missense mutation in TRIM5 and increases CAD risk (p = 2.1 × 10−12, OR 1.09 (95% C.I. 1.06, 1.11). Chromatin interactions between this variant and eQTLs in the promotors/enhancers of three other genes (TRIM6, OR52S1, OR52B6) suggest these genes enhance the expression of TRIM5. Chromatin interactions reveal relationships of chromatin organization in 3D space that may indicate biological function such as promotor-enhancer interactions. The evidence used to support rs11601507 is from a range of Hi-C experimental cell lines (20) (38). rs11601507 is a cis QTL for HBG2 (Hemoglobin) in whole blood (39) and shows significant tissue specific enrichment in veins and blood vessels [DEPICT analysis, (20)]. Ingenuity <sup>R</sup> pathway analysis (IPA <sup>R</sup> ) prioritized TRIM5 and TRIM6 along with 14 other genes for association with CVD. IPA <sup>R</sup> considers upstream and downstream regulators of gene expression based on large scale causal networks (40).Interestingly, this same missense variant rs11601507 and a 5 ′UTR variant rs3824949 in TRIM5 has previously been associated with mean platelet volume (p = 6 × 10−<sup>19</sup> and p = 1 × 10−24, respectively) (41) which is an example of pleiotrophy.

Given the enormous dimensionality of the phenome, it is unlikely that functional variants exist without pleiotrophic effects (42). Pleiotrophy can involve variants having effects on two or more traits via independent pathways (e.g., effects in different tissues) or effect of the variant in one trait causally related to variation in another trait. The risk allele of this variant has the same direction of effect for CAD and mean platelet volume. Using rs11601507 and other variants in a risk score, Astle et al. demonstrated a weak causal relationship between mean platelet volume and CAD risk using Mendelian randomization. Mean platelet volume is associated with increased hemolysis or free hemoglobin in the blood which is linked to increased inflammation. The TRIM5 association may be affecting both traits through inflammatory pathways.

TRIM5 promotes Interferon γ (IFNG) in macrophages, this forms part of the innate immune system (43). It has a capsid specific restriction factor that prevents infection from non-host adapted retroviruses. Interestingly, TRIM5 reciprocally enhances ubiquitination leading to co-operative action of IFNG and NF-κB pathways (44). There is a dynamic relationship between the innate immune system and metabolism, where re-configuration of energy metabolism between oxidative phosphorylation vs. glycolysis can define the immune-phenotype (45, 46). Fatty acids and other metabolites can influence and define immune cell functionality and cause metabolic reprogramming (45). It is hypothesized that this dynamic and reciprocal regulatory relationship between metabolism and inflammation plays a key role in metabolic disease including CAD (7). Macrophages play a key role as innate immune

cellular mediators of inflammation. Activated macrophages can recruit other monocytes/macrophages to a developing lesion and increase lipid uptake and instigate metabolic stress and reprogramming in adipose tissue. Macrophages can become "metabolically activated" in the presence of glucose, insulin and palmitate. Metabolically activated macrophages demonstrate similar effects to classically activated macrophages, where both types activate the TLR4 and NF-κB pathways to promote pro-inflammatory cytokine secretion. However, the metabolically active macrophages also activate PPARγ, therefore controlling inflammation by prompting lipid metabolism (47). TRIM5 promotes IFNG and through a mechanism of decreasing tryptophan metabolism (which viruses rely on), IFNG inhibits the central metabolic regulator mTOR and metabolically reprograms macrophages to switch from glycolysis to oxidative phosphorylation and upregulates inflammation. CVD is associated with changes in many immune cell types at multiple sites of critical metabolic function with a cumulative detrimental effect on cholesterol, lipid and glucose homeostasis (7, 22).

# OLFACTORY SIGNALING INFLUENCES TRIM5 AND IS ALSO LINKED TO ADIPOSITY

The genetic mechanism associated with the TRIM5 variant suggests enhanced olfactory signaling enhances TRIM5 (innate immune signaling) to reduce lipolysis which enhances adiposity and increases risk of CAD. OR52B6, and OR52S1 are Gprotein coupled olfactory signaling receptors (ORs) (48). These receptors interact with odorant molecules in the nose, to initiate a neuronal response that triggers the perception of a smell. OR52B6/OR52S1 have not been linked by GWAs signals as important regulatory variants but other variants related to olfaction have been have been significantly linked to obesity development through GWAS (8). Olfactory signaling is highly complex and can play a bidirectional role in controlling energy homeostasis in response to sensory and hormonal signals from the central nervous system (CNS) (49). Essentially the ORs may alter the drive to eat a poor diet, leading to obesity, hence elaborating an environmental insult. Reduced olfactory signaling increases β-adrenergic receptors on white (WAT) and brown adipose tissue (BAT) increasing lipolysis and fatty acid oxidation reducing obesity in mice (49). Olfaction influences the loss of function mutation in ADCY3 (50) gene and its interaction with the major obesity gene MC4R which disrupts ciliary targeting in neuronal cells critical for body weight regulation (35, 51). Heterozygous or homozygous null mice for ADCY3 are unable to smell (35).

In summary, innate immunity is important in the pathogenesis of CVD, here the association between a variant linked to innate immunity is reinforced and mediated through a novel mechanism of olfaction. The immunosuppressant drug cyclosporine is an antagonist for TRIM5 suggesting a potential therapeutic intervention is available to explore for functional relevance (20). More generally, targeting systemic inflammation through interleukin 1β (e.g., Canakinumab) has been shown to reduce CVD risk and by doing so has validated the inflammatory hypothesis of atherothrombosis (52). The variants TRIM5 and PROCR (p = 6.8 × 10−12) reaching GWAs significance are related to inflammation, which are relatively newly identified, show convergence between biological and genetic determinants of CVD and add to this inflammatory hypothesis (18, 20). An alternative therapeutic paradigm to anti-inflammatory modalities may be efforts to mimic the resolution of inflammation using specialized lipid mediators and their targets (53–55).

### FNDC3B, ADIPOGENESIS AND CAD RISK

rs12897 is a common variant (MAF 0.41) showing a protective association with CAD; OR 0.96 (95% C.I. 0.95, 0.97) (p = 1.9 × 10−10), this SNP is an eQTL for the protein coding gene Fibronectin type III domain containing 3B (FNDC3B) (39) occurring in the 3′ UTR region of the mRNA likely affecting post-transcriptional regulation of gene expression. This SNP was the 3rd top gene prioritized by DEPICT (p = 1 × 10−21) in the overall analysis (20). IPA <sup>R</sup> prioritized a functional association between the protein of FNDC3B, TRIM5, TRIM6, VEGFA, and 12 other genes for association with CAD supporting a broader connectivity among these.

Adipogenesis is a key regulatory process, which determines adipose functionality, and its dysfunction is associated with metabolic-inflammation, hypoxia and related risks including insulin resistance (6) and deregulated cholesterol homeostasis and lipid metabolism (56) all of which lead to greater T2D and CVD risk (57). FNDC3B (alias FAD104) is a positive regulator of adipogenesis (58); specifically at the early stages of adipogenesis (59, 60). FNDC3B variant rs12897 was previously associated with large scale GWAs on height (p = 3 × 10−39) (61), waist-to-hip ratio (WHR) adjusted BMI (p = 8 × 10−10) and HIP adjusted BMI (p = 3 × 10−12) (62) and heart rate (p = 1 × 10−<sup>9</sup> ) (63). Interestingly, intronic variants near FNDC3B strongly associated with intra-ocular pressure p= 9 × 10−<sup>48</sup> (64) and p = 5 × 10−<sup>50</sup> (65), however these variants are not in LD with rs12897. Although intraocular pressure may reflect changes in heart rate, this association may operate through a different, peripheral CAD pathway.

GWAs on specific adiposity traits and fat distributions (pericardial fat, visceral fat, WHRadjBMI, body fat percentage) have shown distinct genetic components (66, 67). WHR adjusted BMI and body fat percentage traits identified adipogenesis candidate genes to play key roles in adiposity. These genes included BMP2 (p = 3.3 × 10−14), CEBPA, PPARγ, HOXCmir196, TBX15, and PEMT but these variants had no apparent regulatory links/eQTLs (62). While CEBPA and PPARγ are essential for white adipose tissue differentiation and are master regulators of adipogenesis, BMP2, like FNDC3B, is involved in early stage adipogenesis. FNDC3B and BMP2 are both involved in the early stage commitment of pre-adipocytes to proliferate and differentiate. FNDC3B (and BMP2) specifically induce and/or regulate the differentiation of committed progenitor cells toward adipogenesis or osteogenesis (68). Adipocytes and in particular pre-adipocytes are now recognized as more than fat-storing organelles having the capability to secrete cytokines and adipokines thus contributing to inflammation (69).

## ADIPOGENESIS AS A THERAPEUTIC MECHANISM TO REDUCE METABOLIC RISK

Defining the effectors which control the fate of adipocytes is of great interest to the therapeutic treatment of obesity. Obese individuals have a smaller proportion of brown adipose tissue (BAT) compared to white adipose tissue (WAT) which expands in response to lipid excess by hypertrophy, hyperplasia and inflammation and upon reaching a certain size become dysfunctional and necrotic, promoting macrophage infiltration. The conversion of WAT to the more functional energy dispersing BAT adipocytes would be a valuable approach to the treatment of obesity and its metabolic complications and is becoming the focus of anti-obesity research (9). Conversion of WAT to BAT can occur by two processes; adipogenesis (i.e., de-novo-adipocyte differentiation of precursor cells which FNDC3B may play a role) or more commonly trans-differentiation (i.e., WAT to beige/brite transition through molecular reprogramming, increasing mitochondrial oxidative phosphorylation/lipolysis requiring increased levels of uncoupling protein 1 (UCP-1) and enervated with b-adrenoreceptors (9). BAT derived from adipogenesis is more sensitive to stimuli from BMP7 (70) and BMP4 (71), irisin/FNDC5B (72), FGF21, and others. Irisin/FNDC5B is a myokine/cytokine that induces thermogenesis except in the obese state where it has a complex adaptive response to counterbalance decreased insulin sensitivity and other metabolic disorders associated with obesity (73) and is a key molecular target to induce browning of WAT (9). FNCD5B expression is highest at early stages of preadipocyte differentiation sharing 53% homology with FNDC3B but their relationship is unclear. While the eQTL affects the expression of FNDC3B, it is not known if this regulation is specific to a CVD relevant tissue or cell type. If regulation of FNDC3B is a key step in increasing adipogenesis, modulation of this process could enhance thermogenesis.

The strongest obesity variant associated to date, FTO, can act through a complex regulatory network also affecting preadipocyte differentiation highlighting the importance of this pathway (35). Interestingly, this regulation ensures it is restricted in a cell/tissue specific way to preadipocytes and mesenchymal adipocyte progenitors, not in brain or 120 other cell types (35). The causal variant associated with FTO disrupts AR1D5B binding in the risk haplotype leading to a loss of repression, this derepresses pre-adipocyte enhancer activity and increases IRX3 and IRX5 expression which represses mitochondrial thermogenesis and adipocyte browning making cells more likely to store fat.

## OTHER VARIANTS OF GENES INFLUENCING TRANSDIFFERENTIATION OF WAT TO BAT ARE ALSO LINKED TO CAD RISK

In addition to FNDC3B, two other variants PRDM16 and TWIST1 recently associated with CAD risk play key roles in adipogenic transdifferentiation of WAT to BAT which highlights this pathway as relevant to disease risk and therapeutic exploration. From a biological perspective PRDM16 is one of the most effective molecular targets to induce white-tobrown adipocyte trans-differentiation (9) and an intronic variant rs2493298 close to PRDM16 was recently identified to increase CAD risk (20). The variant rs2493298 p = 1.9 × 10−<sup>9</sup> , near PRDM16 occurs in an intronic region which physically interacts (chromatin, Hi-C experiments) with the promotors of three genes that act as enhancers (20) which have roles in metabolism (9). PRDM16 is essential for normal BAT function, it interacts with C/EBPβ and these are considered master regulators of BAT function (74), and functions in a feedback loop with PPARy and SIRT1. No pharmacological targets for PRDM16 are advanced enough to explore in clinical trials (9). PPARy coactivator 1 alpha (PGC1a) is another important control point of the BAT phenotype and it is repressed by TWIST1 which blocks target genes associated with PGC1a activity leading to browning of WAT (75). An eQTL rs21079595 intergenic to TWIST1 increases risk of CAD 1.3 × 10−<sup>24</sup> and was prioritized as a core CAD related gene (20). Previously this variant had been linked to HDAC9 gene through proximity, but expression data from the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task Study (STARNET) in two different tissues prioritized this eQTL variant to TWIST1 (76). Manipulation of PGC1a can also achieve reductions in inflammatory disease risk and enhance adipogenesis through dietary fat modification (6).

In summary, several variants associated with adipogenesis have been associated with CAD risk. Dysregulation of adipocyte browning/thermogenesis, particularly in visceral fat surrounding thoracic and aortic arch, is important in the pathogenesis of CVD. FNDC3B is among several variants that impact adipogenesis, while the regulatory networks among these still need more complete understanding, manipulation of this pathway at the preadipocyte stage could impact CVD risk.

# CCM2, ENDOTHELIAL FUNCTION AND CVD RISK

The variant rs2107732 causes a missense mutation in the CCM2 gene and is associated with reduced risk of CAD (OR 0.94 (95% C.I. 0.93–0.96), p = 3.6 × 10−<sup>8</sup> (20). A variant in the promotor of MYOG1 forms a chromatin interaction with the CCM2 variant suggesting this regulates CCM2 (20). MYOG1 is a muscle specific transcription factor that induces myogenesis (muscle formation). Mutations in an orthologous mouse gene of CCM2 cause a cardiovascular phenotype in mice and mutations in MYOG1 caused abnormalities in inflammation/white blood cells of mice

(Mouse Genome Informatics database) (20). Inherited loss-offunction mutations in CCM2 (and also CCM1 and CCM3 genes) are implicated in abnormal vascular morphogenesis and can cause vascular lesions called cerebral cavernous malformations which develop in the human CNS (77). CCM2 is expressed in the brain and heart and CCM genes (including CCM2, CCM1, CCM3) are crucial regulators of heart and vessel formation and integrity by restricting vascular permeability and maintaining vascular homeostasis (78–80). These genes form complexes but also have complex independent roles (81). CCM2 restricts vascular permeability and maintain endothelial barrier function (tight and adherens junctions) by inhibiting Rho A-Rho kinase activity by enhancing Rho A proteasome degradation (79, 82). A lack of CCM2 increases Rho A Rho kinase activity which disrupts endothelial cell-cell contact causing permeability and stress fiber formation" which is the initial phase in many cardiovascular diseases and characteristic of pathologically activated vascular endothelium. The response of CCM2 may be different in the inflammatory state and the MYOG1 transcription factor may influence this under certain conditions. Endothelial dysfunction reduces the ability of arteries to fully dilate, which stimulates vasodilators from the endothelium like nitric oxide (NO), decreased availability of NO or inactivation due to reactive oxygen species increases dysfunction. An intronic variant in NOS3 is also prioritized as a core causal variant of CAD alongside CCM2 both being important to blood vessel morphology and function (20) with several other rare and common variants in GUCY1A3 PDE5A and PEDE3A (16, 20, 36, 83, 84) highlighting the importance of the NO/cGMP signaling pathway to atherosclerosis and CAD risk.

Increased vascular permeability correlates with neoangiogenesis (80). CCM proteins also control angiogenesis via Rho-kinase and other signaling pathways (78), CCM2 inhibits angiogenesis, loss of CCM2 causes dramatic angiogenic remodeling abnormalities (85). Adipose tissue is probably the most highly vascularized tissue in the body, as each adipocyte is encircled by capillaries, angiogenesis plays a key role in its function (86). Angiogenesis is driven by a complex interplay of angiogenic factors and inhibitors including vascular endothelial growth factor A (VEGFA). VEGFA is among the top 64 novel CAD loci increasing risk of CAD (OR 1.95 (95% C.I. 1.03–1.06), p = 1.9 × 10−12) (20) and waist to hip ratio adjusted BMI p = 3 × 10−<sup>27</sup> (62). CCM proteins and particularly CCM3 can regulate VEGFA expression (80) [typically CCM2 and CCM3 function as a complex, (81)]. Lipid accumulation in adipocytes activates Rho Rho kinase signaling by breaking endothelial cell barriers/stress fiber formation triggering inflammatory changes (87). Vascular remodeling determines the flexibility and metabolic rate of adipose tissue and the communication between adipose and endothelial cells is crucial. Dysfunctional communication in obese individuals contributes to development and progression of T2D including impaired vasodilation, hypoxia and inflammation. CCM2 and VEGFA play roles at the interface of this cellular communication. A more complete understanding of the regulatory networks connecting CCM2 (inhibits angiogenesis) and VEGFA (stimulates angiogenesis, of which there are already targeted drugs), might synergistically increase the resulting therapeutic efficacy to combat obesity and CVD.

# GWAS TARGETS WITH FUNCTIONAL LINKS TO IMMUNO-METABOLISM AND CORONARY ARTERY DISEASE

In summary, the three novel GWAs signals implicated in CAD risk play putative roles in immuno-metabolism (**Figure 1**). TRIM5 has potential to increase innate immunity, inflammation and CAD risk via macrophage infiltration of adipose tissue increasing metabolic stress. Activation of pro-inflammatory immune cells requires a shift to move from energy efficient oxidative phosphorylation to anaerobic glycolysis favoring glucose as substrate. This break or shift occurs when macrophages become polarized (M1) and is associated with nitric oxide production; an M1 effector molecule triggered by increasing oxidative stress. The mutation in CCM2 may reduce oxidative stress to maintain endothelial function, control angiogenesis and vascular remodeling of blood vessels including those surrounding adipose tissue to reduce CAD risk. Inhibiting glycolysis promotes the resolution of inflammation. FNDC3B enhances adipose tissue function by increasing adipogenesis and improving cellular energy efficiency by promoting oxidative phosphorylation and thermogenesis, with PRDM16 and TWIST1 playing similar roles in modifying CAD risk (**Figure 1**).

## DIETARY INTERVENTIONS CONNECTING ADIPOGENESIS AND METABOLIC INFLAMMATION AS THERAPEUTIC MECHANISMS TO REDUCE METABOLIC RISK

Since the recognition that fatty acids can modulate an inflammatory response, e.g., via lipid induced re-programming of macrophage metabolism and inflammation (88) or the NLRP3 inflammasome (89, 90) they have been studied for their immunomodulatory effect on insulin resistance and dysregulated lipid metabolism pathways. Dietary manipulation and certain nutrients have the potential to modulate inflammatory responses. Obesity promotes adipose hypertrophy, with inflammation interacting with the adipogenic process. Pro-inflammatory cytokines IFNG, IL-1β and TNFα, inhibit adipogenesis by downregulating PPARy and C/EBP (91–94) and several dietary components can modulate this effect e.g., reservatrol, flavonoids and polyphenols (95–97). Dietary fat modification to replace saturated fatty acids (SFA) with monounsaturated fatty acids (MUFA) and polyunsaturated fatty acids (PUFA) may provide a potential strategy to lessen inflammation that enhances adipogenesis to attenuate insulin resistance and dysregulated lipid metabolism (6) but the impact of dietary fat modification in humans has varied (98).

Efforts to explain this inter individual variability in response, has focused on the interaction between the genes, metabolites and diet. As diet is the exogenous source of many metabolites, as well as affecting the generation of endogenous metabolites, interactions with the nutritional environment are plausible. However, many putative gene/variant-diet associations have failed to replicate in large studies (99) with various approaches to enhance power (100). Some intriguing examples of specific variant-metabolite interactions modulating disease risk exist from small studies (101, 102). The variant, rs5082 of APOA2 interacts with SFA intake to influence risk of obesity (101). This is modulated through an epigenetic effect on APOA2 regulatory region which promoted an APOA2 expression difference between APOA2 genotypes on a high SFA diet. This selectively dysregulated branched chain and tryptophan metabolic pathways with possible implications for food intake.

### UNDERSTANDING THE REGULATORY NETWORKS UNDERLYING METABOLIC TRAITS

Attention is shifting to large scale studies integrating transcriptomic and metabolomic data to understand the interplay between genes and metabolites (103). To explore genes playing key roles in immunometabolism more specifically, Nath et al. integrated transcriptomics (focusing on immune networks) and metabolomics using 2,168 individuals from two general population cohorts (104). They identified significant expression quantitative loci in 8 immune gene networks highlighting the genetic foundations of these effects. For example, an eQTL in the ARHGEF3 gene (rs1354034 p = 7 × 10−28) had trans regulatory effects on several genes associated with platelet function and this module had diverse effects on 55 metabolites. Other important core immunometabolic associations related to neutrophil activation and viral response. A subset of the cohort measured repeatedly over 7 years, demonstrated the gene-metabolite effects were temporally stable (104). As long-term OMICs data will be collected on population cohorts over time, these signals are likely to become more reliable.

Identifying the genetic basis to these interactions can be useful therapeutically to modulate the variant itself for individualized treatment or modulate the pathway the variant functions in, which can have much wider implications for population treatment e.g., PCSK9 inhibitors for individual and population level treatment of hypercholesterolemia and CVD risk (34). With better understanding of metaboliteimmune interactions, in vivo and interventional studies can be developed to modulate these interactions through existing lipid lowering medications, gut microbe effects or dietary changes. In this way, the immune system itself can be harnessed to reduce the burden of cardiovascular and metabolic disorders (55, 71). With distinct lifestyle strategies now known to differentially affect the way adipose tissue is stored and utilized in the body (105), it is important to understand where and how the drivers of these regulatory networks are acting, which might be under specific situations or locations.

# CELLULAR SPECIFICITY OF REGULATORY NETWORKS

Progress has been made to determine the tissues and cell types underlying disease through the GTex consortium (106). GTex, Roadmap Epigenomics and Functional Annotation of Mammalian Genomes 5 (FANTOM5) provides reference sets for multi-tissue gene expression and epigenomics consistently evaluated on the same individuals with available tissues. Different layers of regulation can exist from post-transcriptional, posttranslational, protein-protein interactions and intercellular signaling, mediated through chromatin interactions and expression quantitative trait loci. Assuming that most of the regulation occurs through genes (linked by eQTLs), regulation can occur at the tissue level, broad cell population level or in very specific cell types (106–110). For CVD, multiple cell types or highly specialized cell types may be involved (e.g., vascular, liver, adipose) where cellular networks could have variable expression across cell types (111). The effect of particular variants would then be an average of its effect size in each cell type weighted by cell type importance (11). Mapping GWAs signals to promotors/enhancers measured by cap analysis gene expression (CAGE) found regulation for specific diseases could be turned on/off in similar complex patterns across different cell types. For example, shared cell type specific regulatory networks distinguishing two subtypes of ulcerative colitis could be distinguished based on regulatory signals guided by GWAs signals enriched in either monocytes exposed to inflammatory signals or epithelial cells (108).

To identify cell type specific gene regulation, grouped cell types or deconvolution methods have been used, but the methods tend to be biased to specific cell types or difficult-to-identify less abundant cell types (112). It is possible to calculate the probability a GWAs variant and eQTL tag the same functional effect and infer the tissues where the effect for a trait is likely manifested (113). Single cell RNAseq (scRNAseq) can identify cell-type or context specific eQTLs, but the requirement for fresh tissue and costs limits large scale screening. Mapping monogenic kidney mutations or genome-wide variants associated with chronic kidney disease to gene expression from scRNAseq of 57,979 mouse kidney cells, Park et al. inferred that these variants were expressed in only one particular cell type (114). This suggests, most genetic diseases of the kidney can be traced to single cell types. Using intercellular variation from expression profiles from 25,000 peripheral blood mononuclear cells from 45 donors, scRNAseq identified cell type specific cis-eQTLs. Although gene regulatory networks were highly personal, their approach identified more genes under genetic control or specific cell type in which the effect is most prominent and found examples of SNPs influencing the co-expression of 2 genes (115). scRNAseq has the potential to group and examine the effects of cells along the cell cycle, along a differentiation path (e.g., adipocyte differentiation) or along a response to an environmental stimulus (e.g., inflammatory signaling) (116). With improved understanding of how these genes impact cell types and tissues, more specific targeted interventions can be developed, for instance improved drugs, mobilizing specific fat deposits (105) or nutritional interventions (117).

# CONCLUSIONS

We have highlighted three novel variants associated with CAD risk which have been prioritized and annotated based on systems genetics approaches including expression quantitative trait analysis and network analysis to infer their functional relevance. These core variants play roles in innate immunity, adipogenesis and endothelial function which drive coronary artery disease and principally in the role that obesity and T2D shape the pathogenesis of CAD through immuno-metabolism. Core variants representing these pathways provide a starting point to potential mechanism that may lead to therapeutic manipulation with further understanding of the regulatory networks connecting these is needed. Given that CAD is a multifactorial disease, it may be possible in the future to develop individual treatment strategies based on these variants or design relevant population level interventions based on the pathways these variants highlight for subsets of people in the

# REFERENCES


population with subtypes of CAD risk related to obesity or T2D.

# AUTHOR CONTRIBUTIONS

MH wrote the paper. YL, CG, and HR discussed and wrote sections, refined and edited the text. All agree with the submission.

### FUNDING

HR is the recipient of Science Foundation Ireland (SFI) principal investigator award (11/PI/1119); CG is the recipient of Science Foundation Ireland (SFI) awards (15/IA/3152 and 15/US/B3130). MH and HR were funded by the Joint Programming Healthy Life for a Healthy Diet (JPI HDHL) funded EU Food Biomarkers Alliance FOODBALL (14/JP-HDHL/B3076); MH is also funded through Science Foundation Ireland (SFI) awards (15/IA/3152 and 15/US/B3130). YL and HR were supported by the Irish Department of Agriculture, Food and the Marine, Healthy Beef (13/F/514) programme.


genomic and cellular architecture of human disease. PLoS Comput Biol. (2018) 14:1–24. doi: 10.1371/journal.pcbi.1005934


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Hughes, Lenighan, Godson and Roche. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Long Noncoding RNA ANRIL: Lnc-ing Genetic Variation at the Chromosome 9p21 Locus to Molecular Mechanisms of Atherosclerosis

Lesca M. Holdt and Daniel Teupser\*

Institute of Laboratory Medicine, University Hospital, LMU Munich, Munich, Germany

Ever since the first genome-wide association studies (GWAS) on coronary artery disease (CAD), the Chr9p21 risk locus has emerged as a top signal in GWAS of atherosclerotic cardiovascular disease, including stroke and peripheral artery disease. The CAD risk SNPs on Chr9p21 lie within a stretch of 58 kilobases of non-protein-coding DNA, containing the gene body of the long noncoding RNA (lncRNA) antisense non coding RNA in the INK4 locus (ANRIL). How risk is affected by the Chr9p21 locus in molecular detail is a matter of ongoing research. Here we will review recent advances in the understanding that ANRIL serves as a key risk effector molecule of atherogenesis at the locus. One focus of this review is the shift in understanding that genetic variation at Chr9p21 not only affects the abundance of ANRIL, and in some cases expression of the adjacent CDKN2A/B tumor suppressors, but also impacts ANRIL splicing, such that 3′ -5′ -linked circular noncoding ANRIL RNA species are produced. We describe how the balance of linear and circular ANRIL RNA, determined by the Chr9p21 genotype, regulates molecular pathways and cellular functions involved in atherogenesis. We end with an outlook on how manipulating circular ANRIL abundance may be exploited for therapeutic purposes.

Keywords: lncRNA (long non-coding RNA), circRNA, GWAS (genome-wide association study), eQTL analysis, transcription, splicing, tumor suppressor proteins, cardiovascular diseases

# INTRODUCTION

Since publication of the first genome-wide association studies (GWAS) of coronary artery disease (CAD) in 2007, Chr9p21 has emerged as the most significant risk locus associated with this frequent disease (1–4). The region contains a number of strongly interlinked SNPs within a stretch of 58 kilobases (kb) of non-protein-coding DNA. Later, the same haplotype block has been associated with other endpoints of atherosclerosis, such as stroke (5–11), peripheral artery disease (12–14), and also with different types of aneurysms (2, 8, 15, 16). Due to the availability of large study cohorts and the better resolution of genetic recombination in this region, it has now become clear that associations with other phenotypes at Chr9p21 fall in distinct haplotype blocks not overlapping with the CAD block (**Figure 1A**). Closely nearby, and proximal to the CAD locus, GWAS found associations with cancer, such as melanoma, glioma, basal cell carcinoma, and acute lymphoblastic leukemia [see (40) for review], and also with glaucoma, and diverse proliferative or

### Edited by:

Jeanette Erdmann, Universität zu Lübeck, Germany

### Reviewed by:

Yuqi Zhao, University of California, Los Angeles, United States Clint L. Miller, University of Virginia, United States

\*Correspondence:

Daniel Teupser daniel.teupser@med.uni-muenchen.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 12 May 2018 Accepted: 01 October 2018 Published: 06 November 2018

### Citation:

Holdt LM and Teupser D (2018) Long Noncoding RNA ANRIL: Lnc-ing Genetic Variation at the Chromosome 9p21 Locus to Molecular Mechanisms of Atherosclerosis. Front. Cardiovasc. Med. 5:145. doi: 10.3389/fcvm.2018.00145 inflammatory diseases, such as endometriosis of the reproductive tract (41), periodontitis (42), and platelet reactivity (43). The region located distally to the CAD region contains a distinct haplotype block associated with type 2 diabetes (44, 45).

In the last 10 years, GWAS have been successfully used to increase the number of genetic loci implicated in CAD risk inheritance. The number of CAD risk loci in the genome rose from 56 by 2013 (24, 46–52) to 80 by 2015 (53–56), to 243 by 2017 (17). Concerning the Chr9p21 locus in these studies, the association rose steadily from p = 5.40 × 10−<sup>23</sup> (rs4977575) (57), over p = 4.68−<sup>101</sup> (rs4977574) (17) to p = 8.8 × 10−<sup>223</sup> (rs4977574) (58). In populations of European descent, the allele frequency is very high (0.48), leading to the situation that approximately one-fourth of people are homozygous for the CAD risk alleles. CAD risk SNPs on Chr9p21 have recurrently been shown to have one of the top-ranking effect sizes [allelespecific odds ratio (OR) for CAD > 1.3] (3, 24). Despite the extent of effects, the Chr9p21 risk is independent of classically known CAD risk determinants, such as dyslipidemia, diabetes mellitus, age, and sex.

The Chr9p21 region contains at least 5 genes, which are, in part, tightly clustered and overlapping. These include the 3.8 kb long ANRIL non-coding RNA, and the tumor suppressors cyclin dependent kinase inhibitor CDKN2A/p16INK4A, CDKN2A/p14ARF , CDKN2B/p15INK4<sup>B</sup> , and methylthioadenosine phosphorylase (MTAP). ANRIL overlaps in antisense the full length of the p15 gene body, while sharing a bidirectional promoter with CDKN2A. Hence, it was also termed CDKN2B antisense RNA (CDKN2B-AS1). Only recently, the picture got even more complex: Advances in high-throughput sequencing and adaptions in bioinformatics mapping of RNA reads to reference genomes have revealed that thousands of genes in our genome produce not only mature linear RNA but also 3 ′ -5′ covalently linked circular RNAs (circRNAs) (59). So far, two studies have shown that a number of circular ANRIL (circANRIL) isoforms exist, comprised of different exons, whereby a downstream exon is fused to an upstream exon by the enzymatic activity of the spliceosome in a reaction termed "backsplicing" [see (60, 61) for review]. Circularizing exons in ANRIL stemmed mostly from middle parts of the lncRNA (**Figure 1A**), which are in part also shared by the linear ANRIL isoforms. CircANRIL was found not only in many different cell lines, but also in many primary cell types, including vascular smooth muscle cells (VSMCs) and macrophages, as well as in heart and vascular tissue (22, 36).

A major focus in exploring how risk is effected by Chr9p21 has been on whether genetic variation affected expression of genes at the locus in cis (**Figure 1A**) or whether it elicited gene expression changes in trans. Top CAD-associated SNPs lie within the distal parts of long linear ANRIL isoforms (**Figure 1A**) and several studies have shown that they co-localize with sequences marked by chromatin modifications, RNA polymerase II transcription patterns and DNA motifs characteristic of bonafide transcriptional enhancers (19, 35, 62–65). Using expression quantitative trait locus (eQTL) analyses in patient samples, several groups have by now investigated if the risk alleles at the locus were associated with the expression of specific target genes in cis (cis-eQTLs). Whereas studies investigating ANRIL expression have mostly used quantitative PCRs (qPCRs) targeting different exons from the lncRNA, expression of p14, p15, p16, or MTAP has either been investigated using genomewide expression arrays or isoform-specific qPCRs. Here, we focus on studies investigating eQTLs in atherosclerosis cohorts but do not cover studies related to other phenotypes, such as cancer, which are reviewed elsewhere (66).

# CIS-eQTLs AT Chr9p21

ANRIL expression at Chr9p21 is complex and at least 20 linear isoforms as well as multiple circular isoforms have been reported [www.ensembl.org, (22, 36, 39)]. In principal, linear and circular isoforms can be distinguished by the fact that the latter derive from a backsplice event, where splicing of a downstream exon (e.g., exon 7) to an upstream exon (e.g., to exon 5) can be detected. Backsplicing of ex7-5 was the most common event observed in our own study in peripheral blood monocytes (36). Concordantly, Burd and colleagues have reported dominant backsplice isoforms spanning ex14-4 in peripheral blood T lymphocytes (22). In both studies, exon 1 and exons 17-20 were not contained in circularized ANRIL (**Table 1**). Thus, for classification reasons, results from studies targeting these exons will be referred to as proximal linear isoforms (containing the first ANRIL exons) and long linear isoforms (containing the distal exons 17-20) (**Table 1**). Since both linear and circular ANRIL may contain exons from the middle portion of the lncRNA (e.g., exons 4-16), a clear distinction as to whether linear or circular isoforms were investigated cannot be made in cases where these exons were targeted by qPCRs which were non-specific for backsplice junctions (**Table 1**).

As one of the first studies on Chr9p21, Jarinova et al. have shown that ANRIL expression was induced by the CAD risk SNP rs1333049 in peripheral blood monocytes (PBMCs). No significant effects on CDKN2A or on CDKN2B were recorded in that study (19). Over the years, comparable quantifications of these genes followed in whole blood, peripheral blood T lymphocytes, lymphoblastoid cells lines, aortic smooth muscle cells (SMCs) and in different tissue samples that are known to have a role in atherosclerosis. For example, vascular tissues such as carotid atherosclerotic plaque samples, samples from aorta, mammary artery, and from the heart ventricles have been analyzed, but also tissues like subcutaneous or omental fat have been used (**Table 1**). Of the 23 cis-eQTL studies conducted in the Chr9p21 CAD region to date, 16 investigated different isoforms of ANRIL, out of which 10 used assays targeting proximal ANRIL exons, 8 used assays targeting the middle region, 6 used assays targeting downstream linear ANRIL exons, and two investigated backsplices contained in circANRIL (**Table 1**). Complicating a clear-cut interpretation, in the different studies, different risk genotypes were used to indicate risk haplotypes. The expression of CDKN2A and of CDKN2B was investigated in 18 studies and MTAP in 10 studies (**Table 1**).

Overall, 80% of the studies investigating ANRIL expression found an association with the Chr9p21 genotype. Here, a trend

FIGURE 1 | (top), regional association plot of CAD risk alleles and graph of recombination rate in the locus (middle), scaled linkage disequilibrium heatmap (D′ ) as derived from the 1000Genomes Project dataset (Phase3V5, CEU) (bottom). The threshold for significance of GWAS hits is indicated as horizontal dotted line (p < 5E-8). Dots for SNPs described in Table 1 are marked in yellow. The suspected core CAD risk region, corresponding to the distal region of ANRIL, has been defined experimentally through multiple CAD GWAS and is highlighted in red. The physical genomic map and the haplotype map are connected by oblique lines. Note that not all RNA transcripts and isoforms are depicted, and that type 2 diabetes (T2D, highlighted blue) and cancer risk regions (highlighted gray) are shown in simplified forms. (B) Model how the genotype at Chr9p21 controls the balance of linear and circular ANRIL RNA expression and potential molecular mechanisms of the different ANRIL isoforms. Linear ANRIL upregulation regulates gene expression in trans and pro-adhesive, pro-proliferative, anti-apoptotic cell functions. High levels of circANRIL inhibit over-proliferation of vascular cells by controlling rRNA maturation through impairing PES1 function in the PeBoW complex.

toward higher expression of the proximal and distal exons contained in linear ANRIL in patients carrying the CAD-risk allele was observed (7 of 10 and 3 of 6 studies). In contrast, circular ANRIL was downregulated in the two published studies in patients carrying the Chr9p21 risk haplotype. No clear tendency was observed when assays targeting the middle region of ANRIL were used (**Table 1**). This is likely explained by the fact that these assays target both, linear and circular, ANRIL isoforms, which seem to be inversely regulated. With respect to the tumor suppressor genes contained at the Chr9p21 locus, 78 and 67% of the studies failed to find an association of CDKN2A and CDKN2B with Chr9p21, respectively. When reporting an association, specifically CDKN2B was down-regulated in the majority of studies (94%), yet its expression was not always anticorrelating with ANRIL expression (19, 21, 22, 29). MTAP expression was not associated with the Chr9p21 genotype in any of the published studies. Overall, the picture emerges that circular ANRIL and CDKN2B tend to be down-regulated in patients carrying the risk allele, whereas linear ANRIL isoforms tend to be inversely regulated (**Figure 1B**). It is currently unclear, why expression of p15 or of p14 and p16 were in many cases positively correlated with ANRIL (19, 21, 22, 27, 29, 32, 34, 65). Also, MTAP, which was not associated with Chr9p21 (**Table 1**), was in some conditions anticorrelating to ANRIL, but not in all cases or contexts (20, 34, 67). SNPs in ANRIL can hypothetically affect enhancers in both directions, either by disrupting transcription factor binding sites in open chromatin (68) or by increasing enhancer activity through yet unknown primary effects (24, 65).

In summary, many studies document cis-eQTLs for ANRIL or, separately, for CDKN2B (35). Throughout, from the existing data, it can be concluded that these effects are cell-type specific and combinatorial. Of note, many studies have investigated only very small cohorts and those, simultaneously testing both ANRIL and CDKN2B in larger cohorts (>1000 samples) identified much stronger effects of Chr9p21 on ANRIL than on CDKN2B (13, 33, 36). This observation might be explained by the haplotype block structure of the region, where effects of CAD lead SNPs are located within ANRIL but bleed through due to linkage disequilibrium, resulting in more subtle concomitant effects on CDKN2B expression. Another possibility is that the Chr9p21 genotype impacts transcription enhancers at the locus which contact and activate gene promoters affecting CAD. The consequences of such contacts would not be expected to be captured through traditional non-allelic RNA expression analysis. In fact, when allelic expression control through 3Denhancer looping was specifically measured in a separate study in human coronary aortic SMCs (64), physical contacts of CAD variant-containing enhancers in the locus and the promoters of CDKN2A, CDKN2B, and ANRIL were corroborated.

Taken together, these data suggest that genetic variation within the core 9p21 CAD region relates to differential expression not only of ANRIL, but in specific cells or conditions, also of the CDKN2A/B tumor suppressors encoded in the locus. While either of these factors could potentially increase cell proliferation, or lead to unscheduled senescence, or elicit out of context inflammatory signaling, as far as based on work with cells in vitro, no study in humans or in mouse models has been able to decisively implicate a downstream effector pathway in vivo.

# TRANS-eQTLs AT Chr9p21 AND MOLECULAR FUNCTIONS OF ANRIL IN TRANSCRIPTIONAL REGULATION

As opposed to cis effects, two eQTL studies have so far detected modest and tissue-selective differential expression of dozens of genes associated with Chr9p21 genotype with genome-wide significance (19, 27). Affected genes were from a broad range of classes (AVPR2, PEAK1, FBLN1, KALRN, DAZL, STAU2, HLA-DQA1, BTNL8, PLEKHA6, TDGF1) in whole blood (19) and different, non-overlapping gene sets linked to tissue wounding, cell migration and inflammatory response, when analyzing heart tissue, plaques, aortas, and arteries (27).

Other, and in part, larger studies in vascular tissue (20), peripheral blood mononuclear cells (PBMC, n = 2280) (33) and in blood monocytes (n = 1490) (23) reported no significant expression association.

Though not directly comparable, another study showed that in macrophages cultured in vitro under stress-bearing IFNγ and LPS stimulation, the CAD risk genotype led to differential upand downregulation of target genes outside the Chr9p21 locus and yet distinct from the previously mentioned studies (IL1B, IL12B, CASP5, CCL8, MT1A, MT1E, MUCL1, TNIP3, VCAN, ENPP2, NDP, CD163) (30). Also ANRIL knockdown in cultured cell lines (69–72) and overexpression of linear ANRIL affected the expression of non-overlapping gene sets in the genome in trans (33, 36).

How ANRIL exerts trans-regulation is not known, and despite a study that showed a physical interaction of ANRIL with promoters of target genes (33), this role is likely not a classical function as enhancer RNA [eRNA (73)], because it involved both up-and down-regulated genes, and was suggested to involve sequence homology (33). In the case of ANRIL, trans-regulation of target genes was ascribed to an ALU motif




 qPCR analysis was performed targeting proximal, middle or distal ANRIL exons (ex). In total 23 studies are reviewed here. References: (13, 17–38). Note that the percentages (%) of studies showing up- and down-regulation of ANRIL isoforms do not necessarily add up to 100%, because different ANRIL exons were quantified in different studies, not all classes of ANRIL transcripts were analyzed in each study, and because one study can report on both up- and downregulation of different isoforms belonging to the same class of ANRIL transcripts (proximal/middle/distal).

\*(ANRIL linearity determined by PCR forward primer residing in exons 1, 2, or 3). \*\* (ANRIL circularity determined by PCR primers detecting backsplicing between exons 10-2, 5-intron3, 6-intron3, 6-4, 7-4, 14-4, 10-4, 12-4, 13-4, 14-4, 16-4,6-5,7-5,8-5,10-5,14-516-5,19-5,7-6,10-6,14-6,16-13\*,16-15)(22,36,39).

 #(cis-eQTLlocatinginenhancerelement,but with unspecifieddirectionofeffectonexpression).

in both ANRIL and the target gene promoters (33). Similarly, an independent study found that ANRIL did not only silence its targets, but unexpectedly also upregulated target genes: For example, proinflammatory interleukins IL6/8 were found to be co-stimulated by ANRIL and YY1, a transcription-regulating factor that bound to the ANRIL RNA, especially in the context of TNFα/NFκB signaling (70). Therefore, opposite to what could have been expected from the reported physical interaction of ANRIL with proteins from the repressive Polycomb group complexes (74), ANRIL might be an activator, at least for some trans-regulated genes (33, 70) (see chapter 4 for details). Whether circANRIL, beyond regulating rRNA maturation, is involved in primary transcriptional control, alone or via impacting linear ANRIL's function, is not known (36). Nevertheless, it is interesting to note that circANRIL isoforms linked to CAD are produced from exons located in the middle of the ANRIL gene (22, 36), and as such do not include the ALU motif, which is important for gene trans-regulation by linear ANRIL and is located more distally in the gene (33). Thus, variation in ANRIL RNA at the molecular level (linear vs. circular) might impose a fundamental alteration in ANRIL effector function, while not offering any explanation per se on how linear ANRIL regulates genes, as scaffold for promoter-activating complexes, or as decoy/inhibitor of repressive chromatin-modifying complexes. Conservatively speaking, it seems possible that Chr9p21 CAD risk genotypes affects genomic expression both in cis and in trans, and linear ANRIL RNA may be one, but not the sole, important effector molecule for how the Chr9p21 locus transduces such effects (**Figure 1B**).

# CORRELATION OF CHR9P21 GENES WITH ATHEROSCLEROSIS SEVERITY IN HUMANS AND MOUSE MODELS

Another piece of evidence for a functional role of ANRIL in determining CAD risk stems from correlation analysis with disease features in patient cohorts. Aside of the genetic association, ANRIL levels were often increased in CAD patients, and not only in atherosclerotic plaque tissue, but also in circulating PBMCs or whole blood. Here, linear ANRIL levels were positively correlated with the severity of atherosclerosis (13, 29, 75) whereas circANRIL was anticorrelated (36) (**Figure 1B**). Thus, while the genotype of Chr9p21 determines the production of atherogenic (linear) over antiatherogenic ANRIL RNA species (circular), CAD and peripheral artery disease-dependent changes may additionally feed into ANRIL regulation. For CDKN2B, two studies reported a correlation of the expression with atherosclerosis severity (34, 76), where the direction of the correlation (downregulation in plaques) was consistent from what could be expected from the association results. But another study reported increased p16INK4<sup>A</sup> levels to positively correlate with inflammation markers in plaques instead of anticorrelation (25). Together, results from association as well as correlation analyses have etablished ANRIL lncRNA as prime candidate at the Chr9p21 locus.

# MOLECULAR FUNCTION OF ANRIL AND CDKN2A/B IN ATHEROGENESIS

ANRIL belongs to the group of long non-coding RNAs and as such has been suggested to act as a molecular scaffold of chromatin-modifying complexes that control gene expression through modifying histone tails. Specifically, ANRIL was found to physically interact with the CBX7 protein inside the PRC1 Polycomb complex, one of the major gene repression complexes in cells (74). Knockdown of members of this Polycomb group complex led to increased expression of the CDKN2A and CDKN2B tumor suppressors in the Chr9p21 locus. Also, ongoing RNA polymerase II transcription was important for the association of the Polycomb proteins with the locus, indicative of the importance of RNA for recruitment. It was concluded that ANRIL's function may be, at least in part, to repress the CDKN2A and CDKN2B tumor suppressors. As a consequence increased ANRIL levels are thought to promote overproliferation and to be incompatible with senescence onset, a major function of CDKN2A/B. As described in chapter 2, other work has shown that recruitment of the Polycomb complexes may account also for how ANRIL regulates genes in trans on a genome-wide level: Overexpression of linear ANRIL isoforms in cultured cells was found to promote pro-atherogenic cell functions, such as proliferation and reduced apoptosis, and to trigger the differential expression of hundreds of genes, in this case without affecting CDKN2A/B suppressors. Results from that study therefore questioned whether ANRIL regulated these tumor suppressor genes in cis at all (36, 77).

How does circular ANRIL, whose abundances is reduced in CAD patients, fit into this model? Both in human peripheral blood T-lymphocytes, as well as in PBMCs, whole blood and endatherectomy plaque tissue, circANRIL isoforms were found to be downregulated in samples from CAD patients carrying the Chr9p21 risk allele (22, 36). In an initial model, it was suggested that the production of circANRIL from central ANRIL exons would shorten the linear ANRIL lncRNA and, thereby, impaired linear ANRIL's function in epigenetic control of target genes (22). In a second study, a more primary role was found for circANRIL that was, furthermore, independent of linear ANRIL (36). Here, circANRIL was found to be 10-fold more abundant than linear ANRIL. Mass-spectrometric analysis of proteins interacting with circANRIL showed that it bound to PES1 protein, a member of the evolutionarily conserved PeBoW complex. This complex is essential for proper rRNA-processing, that is the excision of RNA spacer elements from pre-ribosomal rRNA precursors. CircANRIL inhibited the activity of the PeBoW complex, as deduced from the accumulation of unsufficiently processed (and non-functional) 26S and 32S pre-rRNA intermediates when circANRIL was overexpressed (36). A deficit in rRNA maturation caused nucleolar stress and p53 activation, culminating in inhibition of cell proliferation and in an increase in apoptosis. Notably, the observed functions of circANRIL were inverse to that of linear ANRIL and, as shown by genomic knockout of linear ANRIL exons, independent from the presence of these lncRNA isoforms. Thus, experimental evidence from expression analysis in vivo and from genetic experiments both indicated that circANRIL was anti-atherogenic. Together, linear ANRIL confers overproliferation, and circular ANRIL protects from overproliferation, suggesting that the genotype of Chr9p21 is important to determine the balance of linear and circular ANRIL levels in SMCs and macrophages, and that a dominance of linear ANRIL in this ratio, even when small, over decades skews for CAD (36) (**Figure 1B**).

Whethersuppressing linear ANRIL or boosting circularization is sufficient to protect from atherosclerotic cues in vivo is matter of ongoing research. The fact that ANRIL RNA is not conserved beyond primates complicates the functional in vivo analysis of the Chr9p21 locus. So far, insight on how CAD is controlled by Chr9p21 through genetic modeling in mouse mutants is fragmented. The genetic elements of Chr9p21 and their relative positioning are overall syntenically conserved in mouse chromosome 4. So far, only one study has investigated, if deletion of a 70 kb long portion of mouse Chr4 corresponding to the CAD haplotype block in humans had an effect on atherosclerosis in vivo (78). This region contains a multi-exon lncRNA, AK148321, which is, however, likely not corresponding to human ANRIL. Mutant mice (78) developed tumors, reminiscent of tumorigenesis associated with mutation in the Chr9p21 region. But despite some metabolic changes in the mutant mice and enhanced platelet activation, no significant change in atherosclerotic fatty lesion formation was observed (78), putting in question the validity of this mouse model for studying ANRIL-driven atherogenesis. On the other hand, the mutants did develop more vascular aneurysms (79), supporting that some aspects of CAD were indeed contained in the noncoding mouse sequence.

Overall, the picture is not yet fully clear. While the genetic data from mice support the importance of individual noncoding genetic elements and of some of the protein-coding tumor suppressors for regulation of atherosclerosis and other CAD entities, whether the lncRNA encoded in the locus regulates CAD mechanistically via epigenetically regulating the neighboring tumor suppressors in cis has not been determined. Nevertheless, mouse genetics remains an interesting research avenue to explore some aspects of Chr9p21 biology, at least relating to aneurysm, cancer, and glaucoma formation.

### SUMMARY

Starting from a GWAS signal for CAD in a "gene desert" on Chr9p21 in 2007, research in the last decade has firmly established this region as strongest genetic factor of human atherosclerosis and has contributed to a better understanding of the underlying pathophysiology. The picture has emerged that

### REFERENCES


one of the major routes how this locus controls atherosclerosis risk is through regulating the expression of the lncRNA ANRIL in cis, where the risk allele leads to high levels of linear ANRIL but decreases circular ANRIL expression. Linear ANRIL has been established as molecular scaffold guiding epigenetic protein complexes and promoting pro-atherogentic cells functions. On the contrary, circularization shifts ANRIL's function toward controlling ribosomal RNA processing and controlling protein translation thereby promoting athero-protection (**Figure 1B**). The molecular mechanisms of how the ratio of linear and circular ANRIL is controlled by the genotype at the locus are currently not resolved and it will be important to determine which gene regulatory elements within the ANRIL gene are disturbed by causal CAD risk SNPs. Experimentally exploring details of the molecular effector mechanisms for linear ANRIL and for circular ANRIL will be paramount, but this task will not be trivial because linear and circular ANRIL isoforms always co-exist and in part share the same sequence. Not last, more nuanced relations between Chr9p21 genotype and gene expression output can be expected to be found in the future if, for example, analyses were to take into account cell type-specific and context (stress, inflammation, senescence)-specific effects, aspect that whole tissue expression profiling is currently missing. Additionally, although it is early days, measuring the levels of circANRIL/linear ANRIL, might offer a prognostic value and help improve CAD risk stratification or allow to better monitor treatment response or disease recurrence.Yet, since circANRIL levels are reduced in plaque tissue, and since circANRIL has been found to be anti-atherogenic with or without co-existing linear ANRIL, increasing circANRIL abundance in patients could also be of therapeutic relevance. Expressing circANRIL levels in the cells of the vasculature in CAD disease models might, therefore, be a promising next step to exploit the accumulated knowledge on the Chr9p21 CAD risk locus.

## AUTHOR CONTRIBUTIONS

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

### ACKNOWLEDGMENTS

We thank Bernd Northoff for bioinformatics analyses and his aid in preparing the Figure. This work was in part funded by the German Research Foundation (DFG) as part of the Collaborative Research Center CRC1123 Atherosclerosis–Mechanisms and Networks of Novel Therapeutic Targets (project B1) and by the Leducq-foundation CADgenomics.

affects the risk of myocardial infarction. Science (2007) 316:1491–3. doi: 10.1126/science.1142842


shared controls. Nature (2007) 447:661–78. doi: 10.1038/nature 05911


correlate with ANRIL expression. PLoS Genet. (2010) 6:e1000899. doi: 10.1371/journal.pgen.1000899


Smad2 signaling and promotes vascular aneurysm. Circ Cardiovasc Genet. (2014) 7:799–805. doi: 10.1161/CIRCGENETICS.114.000696

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2018 Holdt and Teupser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Into the Wild: GWAS Exploration of Non-coding RNAs

Hector Giral 1,2, Ulf Landmesser 1,2,3 and Adelheid Kratzer 1,2 \*

<sup>1</sup> Department of Cardiology, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany, <sup>2</sup> DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany, <sup>3</sup> Berlin Institute of Health (BIH), Berlin, Germany

Genome-wide association studies (GWAS) have proven a fundamental tool to identify common variants associated to complex traits, thus contributing to unveil the genetic components of human disease. Besides, the advent of GWAS contributed to expose unexpected findings that urged to redefine the framework of population genetics. First, loci identified by GWAS had small effect sizes and could only explain a fraction of the predicted heritability of the traits under study. Second, the majority of GWAS hits mapped within non-coding regions (such as intergenic or intronic regions) where new functional RNA species (such as lncRNAs or circRNAs) have started to emerge. Bigger cohorts, meta-analysis and technical improvements in genotyping allowed identification of an increased number of genetic variants associated to coronary artery disease (CAD) and cardiometabolic traits. The challenge remains to infer causal mechanisms by which these variants influence cardiovascular disease development. A tendency to assign potential causal variants preferentially to coding genes close to lead variants contributed to disregard the role of non-coding elements. In recent years, in parallel to an increased knowledge of the non-coding genome, new studies started to characterize disease-associated variants located within non-coding RNA regions. The upcoming of databases integrating single-nucleotide polymorphisms (SNPs) and non-coding RNAs together with novel technologies will hopefully facilitate the discovery of causal non-coding variants associated to disease. This review attempts to summarize the current knowledge of genetic variation within non-coding regions with a focus on long non-coding RNAs that have widespread impact in cardiometabolic diseases.

### Edited by:

Jeanette Erdmann, Universität zu Lübeck, Germany

### Reviewed by: Baiba Vilne,

Technische Universität München, Germany Thorsten Kessler, Deutsches Herzzentrum München, Germany Arne S. Schaefer, Charité Universitätsmedizin Berlin, Germany

> \*Correspondence: Adelheid Kratzer adelheid.kratzer@charite.de

### Specialty section:

This article was submitted to Cardiovascular Genetics and Systems Medicine, a section of the journal Frontiers in Cardiovascular Medicine

> Received: 05 July 2018 Accepted: 03 December 2018 Published: 17 December 2018

### Citation:

Giral H, Landmesser U and Kratzer A (2018) Into the Wild: GWAS Exploration of Non-coding RNAs. Front. Cardiovasc. Med. 5:181. doi: 10.3389/fcvm.2018.00181 Keywords: lncRNA, genetic variant, GWAS, coronary artery disease, cardiometabolic disorders

In the dawn of the millennium, the first draft of the human genome represented a major milestone in the path to decipher the genetic component of human disease. Further refinement of the human genome by the 1,000 Genomes Project mapped over 88 million variants from 26 populations where ∼20 million correspond to common (frequency >0.5%) single-nucleotide polymorphisms (SNPs), a coverage of >95% of all estimated human common SNPs (1, 2). Other consortia such as Encyclopedia of DNA Elements (ENCODE) (3, 4) and Functional Annotation of the Mammalian Genome (FANTOM) (5) contributed to the generation of a detailed atlas of DNA functional elements and transcriptional units uncovering that more than 80–90% of the human genome is transcribed and display some functionality (4). In this context, Genome-wide association studies (GWAS) emerged as a fundamental tool to define single nucleotide polymorphisms (SNPs) associated to complex human traits or diseases (6–10). With regard to cardiovascular disease, GWAS studies identified up to 161 genetic risk loci associated to coronary artery disease (CAD) (11–13).

Despite the profound contributions of GWAS to the understanding of human disease pathophysiology, some issues forced to redefine the framework of GWAS studies. First, most significant GWAS hits could only explain a small fraction of genetic variance for a specific trait (14). In the case of CAD, all 161 genome-wide significant loci account for 15.1% of the predicted genetic contribution to the disease (15), which is strikingly similar to the percentage of gene sets (13.9%) or gene networks (14%) implicated on these 161 CAD-associated loci (12). An emerging notion, known as omnigenic model, states that cell regulatory networks are so deeply connected that basically all genes expressed in disease-relevant cell types conspire to influence the heritability of complex traits (16). Therefore, this model assumes that thousands of loci with small size effects contribute to the overall heritability of the trait or disease by affecting the expression of a smaller set of core genes (16). It seems that the common disease-common variant (CD-CV) model that drove the first decade of GWAS studies is shifting to a complex trait-complex genetics (CT-CG) scenario, where a handful of relevant variants cannot fully explain genetic variation in whole populations. The overall notion of a widespread dispersion of genetic contributions to disease due to the interconnectivity of biological systems seems to be widely accepted. On the other hand, the concept of a set of core genes driving the phenotype of complex diseases is still controversial and as a result the choice of methodology to address the future of the field (17).

Nearly 90% of all phenotype-associated SNPs identified by GWAS lied within non-coding regions (18–20), which includes a broad spectrum of locations including intronic or promoter regions, small ncRNAs such as miRNAs, long ncRNAs, antisense, and enhancer or insulator regions. Most non-coding variants are concentrated in deoxyribonuclease I (DNase I) hypersensitive sites that label regions with increased chromatin accessibility. Currently, around 2,500 miRNAs and more than 50,000 lncRNAs have been annotated in the human genome, practically doubling the number of protein coding transcripts, highlighting the important role of this part of the genome (21).

This review summarizes genetic variations within lncRNAs associated to cardiovascular disease (CAD, MI) and to various cardiometabolic risk factors for cardiovascular disease such as lipoprotein metabolism, diabetes or hypertension (**Table 1**).

### IMPACT OF GENETIC VARIANTS ON LNCRNAS FUNCTIONALITY

One of the longest-standing challenges in human genetics is to assign potential causality within a locus to every variant in close linkage disequilibrium (LD) with the lead variant (34). Despite the potential of lncRNAs as causal factors of disease, GWAS studies had a tendency to explore genetic variant causality preferentially in coding genes, mostly due to our limited knowledge of ncRNAs genomic structure and functionality. Additionally, lncRNAs overlapping coding genes (such as antisense and intronic lncRNAs) are harder to dissociate from neighboring coding genes when searching for potential causal variants compared to intergenic lncRNA (lincRNA) which do not overlap coding genes. Fortunately, interactive lncRNA databases (LincSNP2.0) (35) together with established GWAS catalogs like NHGRI-EBI (36) and GWASdb.v2 (37) have started to integrate newly identified lncRNAs transcripts and disease-associated genetic variants. The latest databases mapped 371,647 disease-associated SNPs to lncRNA what accounts for approximately 45% of all disease-associated human SNPs identified (35).

Recent approaches focused on lincRNAs by further exploring loci previously associated to CAD (32, 38–41). For example, a class-level testing framework, termed Genetic Class Association Testing (GenCAT) allowed the identification of new traitassociated variants within multiple lincRNAs contributing novel insights into their role in cardiometabolic pathophysiology (42). GenCAT approach includes SNPs directly within the lincRNA but also the ones 500 kb up- or downstream of the lincRNA (38).

In a functional perspective, many lncRNAs reside in the nucleus conducting key regulatory steps in gene transcription, transcript splicing or chromatin structure. Cytoplasmic lncRNAs affect cell homeostasis by modulating translation and stability of mRNA through scaffolding multi-protein complexes that accomplish these functions (43). Several lncRNA functions depend on structural domains that generate binding sites to interact with RNA binding proteins (RBPs) acting as scaffolds for recruitment of proteins, RNA molecules and DNA elements (44–46). Some genetic variants are predicted to impact lncRNA secondary structure and thereby lncRNA–RBP interactions which can dramatically affect their functionality. Low evolutionary conservation of lncRNAs constitutes a challenge to predict structural domains and consequently how genetic variants induce functional modifications (47). Moreover, analysis of variation frequencies suggested that functional elements in lncRNAs have a much lower variation frequency almost comparable to protein-coding exons (48). Alternative splicing is an additional mechanism to generate functional diversity of lncRNAs by differential arrangement of structural domains (19).

Furthermore, SNPs may affect lncRNA transcriptional expression by altering its promoter region but also may influence expression of proximal or distal protein coding genes through the action of enhancers (19). Modulation of distant genes by trans-regulation is mediated by lncRNAs-enhancers but the effect of induced chromatin structural changes must be also considered. Chromatin structural loops link regulatory enhancer elements to distant gene promoters and variants disrupting this process broadly influence gene expression (49). Distal regulatory elements (DRE) can regulate the transcription of lincRNA through chromatin interactions, which can be influenced by GWAS-identified SNPs and define disease association (50).

# LONG NON-CODING RNAS ASSOCIATED TO CARDIOMETABOLIC TRAITS

The first examples of SNP variants associated to increased risk of CAD located within a lncRNA were identified in the locus


TABLE 1 |List of representative lncRNA and variants identified by GWAS associated to cardiometabolic disease

traits.

A profile of the well-studied lncRNAs such as ANRIL as well as other candidate lncRNAs that carry variants which have been either associated to CAD, MI or cardiometabolic traits which are high risk factorshypertension, obesity. aOfficialID

 name; Gencode (GrCh38.p12); aliases

bOfficial variant ID; variant location in the locus

cNHGRI GWAS Catalog study ID described; reference of the study; p-value

 dalternative sources such as GWASdb catalog or meta-analysis studies; reference of the study; p-value

eGTEx eQTLs; genes associated to the variant (number of hits in the database).

 for the disease such as T2D,

chr9p21.3, which resulted to be the CAD risk locus with the strongest effect found up to date. Locus chr9p21.3 contained multiple SNP variants at the antisense noncoding RNA in the INK4 locus (ANRIL), now referred to as CDKN2B-AS1 (51– 53). CDKN2B-AS1 spans 126.3 kb in a gene cluster next to three tumor suppressor genes (p15/CDKN2B, p16/CDKN2A and p14/ARF), partially overlapping CDKN2B (53–55). Several CDKN2B-AS1 SNP variants also associated to other disease traits such as ischemic stroke, aortic aneurysm, atherosclerosis, specific carcinomas and type 2 diabetes (T2D) (22, 56– 58).

Most SNPs in the core risk region for CAD located within CDKN2B-AS1 intronic areas (118 out of 131 variants) where several enhancers were described (59). These enhancers mediated cys-regulation of neighboring genes like CDKN2A/B or methylthioadenosine phosphorylase (MTAP) but also trans-regulation of genes such as interferon-α21 (IFNA21), one million base pairs upstream (59). CDKN2B-AS1 trans-regulation of gene expression increased cell adhesion and proliferation, both atherogenic processes, in a process partially mediated by ALU elements located in CDKN2B-AS1 (60). Interestingly, CDKN2B-AS1 interacted with a component of the polycomb repressor complex (PRC) 1 and 2, which control the epigenetic repression of the CDKN2B gene (61, 62). In fact, risk variant rs10757278 located at enhancer ECAD9 inside CDKN2B-AS disrupted the binding site of STAT1 transcription factor (59). In lymphoid cells, this disruption of STAT1 binding implied a failure to recruit the repressor machinery and resulted in increased CDKN2B-AS expression, a mechanism that was confirmed by the silencing of STAT1 (**Figure 1A**) (59).

Only five of the CAD candidate variants are located in exons of CDKN2B-AS1 but none of them are located in conserved elements, questioning the likeliness to affect functional domains (59). However, numerous splice isoforms have been identified for CDKN2B-AS1 (14 isoforms, Genbank; 21 isoforms, GENCODE) highlighting a complex alternative splicing regulation that potentially affects the structural domain organization of the lncRNA leading to modulation of its functionality (64). Carriers of risk haplotype presented increased expression of CDKN2B-AS1 splice-isoforms EU741058 (short form) and NR\_003529 (long form) but not DQ485454 (short form) which directly correlated with the severity of atherosclerosis, suggesting distinct roles for CDKN2B-AS1 splicing variants (65). Additionally, splicing isoforms defined by their polyadenylation site in proximal (exon 13) or distal (exon 19) showed trans-regulation of different set of genes. Proximal CDKN2B-AS1 isoforms modulated expression of glucose and lipid metabolism genes (66) while distal isoforms regulated RBMS1 (RNA Binding Motif Single Stranded Interacting Protein 1), a cell cycle suppressor (67). Conversely, circularized CDKN2B-AS1, another form of alternative splicing, showed an atheroprotective role via interaction with pescadillo homolog 1 (PES1) which leads to impaired ribosomal biogenesis (68). An SNP located in the 3′ region of CDKN2B-AS1 associated with reduced expression of CDKN2A, CDKN2B and CDKN2B-AS1 but also with increased VSMC proliferation (69). Other CDKN2B-AS1 variants confer increased myocardial infarction (MI) risk (70), supporting previous findings, where the level of CDKN2B-AS1 significantly increased in peripheral blood mononuclear cells after MI (71). Despite great efforts, causal mechanisms of CDKN2B-AS1 variants have been elusive and not fully unravel yet. For further detail, we refer the reader to other excellent recent reviews on the topic (23, 53, 72, 73).

Myocardial infarction associated transcript (MIAT) was identified as a susceptible locus for MI in a Japanese population by large-scale case-control associated study (63). MIAT expression upregulation in a MI mouse model concomitant with increased cardiac interstitial fibrosis suggested a profibrotic role with a prominent impact in the MI pathogenesis (74). Furthermore, ex-vivo experiments with a diabetic rat model identified a regulatory feedback loop between MIAT, vascular endothelial growth factor (VEGF) and miR-150-5p. MIAT acts as a sponge for miR-150-5p and represses degradation of VEGF mediated by miR-150-5p (**Figure 1B**) (75). Expression of both MIAT and CDKN2B-AS1 increased in human atherosclerotic arteries suggesting a potential role of MIAT on atherosclerotic plaque development (76).

The embryonic lincRNA H19 was identified to be re-expressed in human atherosclerotic plaques and in a rat model of carotid artery injury (77, 78). Recently, a genotyping study of 4 SNPs in H19 locus demonstrated significant association with CAD in a Chinese population (26). Additional GWAS and meta-analysis studies proved association of H19 variants with blood pressure, a well-known risk factor for cardiovascular disease (24, 25). Mechanistically, H19 was proposed to modulate availability of several let-7 miRNAs by acting as a molecular sponge (79). Highly expressed in adult muscle tissue, H19 modulation of let-7 likely controls timing of muscle differentiation since H19 depletion accelerates in vitro muscle differentiation with a concomitant overexpression of let-7 (79). Additionally, H19 was highly upregulated in two different mouse models of abdominal aortic aneurism whereas specific H19 knock-down limited aneurism growth by a mechanism involving decreased apoptosis of smooth muscle cells (80). Other lncRNAs that contained genetic variants associated to CAD have been identified by GWAS studies but not studied further on their putative causal mechanisms such as LOC400684 an uncharacterized antisense RNA in the Zinc Finger Protein 507 (ZNF507) locus (12) or lncRNA LINC00310 which variant rs28451064 is also associated to myocardial infarction (13).

Genome-wide analysis also revealed multiple variants associated to cardiometabolic traits such as cholesterol levels or type 2 diabetes (T2D), both of them established risk factors of cardiovascular disease. For example, genetic variant lying in the lincRNA LOC157273 associated to lipid (HDL cholesterol) (27) and glycemic (fasting insulin levels) (29) traits but also to coronary artery calcification (28). Genetic variants at LOC157273 associated to expression changes of the nearby gene PPP1R3B, a phosphatase involved in hepatic regulation of glucose (81). Another SNP (rs886424) located in the second exon of LINC00243 associated with total cholesterol and triglyceride levels (32). Expression quantitative trait loci (eQTL) analysis also associated variant rs886424 with LINC00243 expression levels of as well as numerous nearby immune-related genes including

STAT1 transcription factor of enhancer region ECAD9. In lymphoid cells the binding of STAT1 to this region has been associated to decreased ANRIL expression, whereas silencing of STAT1 lead to an enhanced expression of ANRIL. The risk variant of rs107577278 disrupts the binding of STAT1 and the repression of ANRIL expression is abrogated. Increased expression of ANRIL promotes a downregulation of CDKN2B/p15 gene expression and underlines a proliferative effect which presumably increases CVD susceptibility. (B) Potential regulatory mechanisms of MIAT expression through different variants. Ishii et al. (63) unraveled that various variants are present in the lincRNA MIAT and associated them to myocardial infarction such as rs3132291. Some variants in Exon 5 have been associated to increased MIAT expression. Yan et al. showed in their study that MIAT can bind miR150-5p in endothelial cells and does inhibiting the degradation of its direct target VEGF. These data suggest that certain variants in the MIAT lincRNA can modify the structure of MIAT and thus leading to increased binding of miR-150-5p and consequently inhibiting the degradation of its target genes such as VEGF.

immediate early response 3 (IER3) and several HLA forms (32). IER3 was reported to inhibit pro-inflammatory cytokines but the exact role of LINC00243 in immune-function and its putative link to cardiometabolic diseases requires further evaluation. One of the SNPs associated to T2D (rs231362) in the KCNQ1 locus overlaps both KCNQ1OT1 lncRNA antisense and the intron 11 of KCNQ1 (32). Several other polymorphisms in KCNQ1 locus associated also with cardiovascular events (82) and some showed protective effect against arrhythmic risk in long-QT syndrome (83). Both KCNQ1OT1 and CDKN2B-AS1 were shown to be valid predictors of left ventricle dysfunction after an MI (71). KCNQ1OT1 is an imprinted gene that is expressed only from the paternal allele and responsible to silence a proximal cluster of genes (84). Mechanistically, KCNQ1OT1 acts as a scaffold for the chromatin modifiers HMT G9a and PRC2 as well as DNA methyltransferase Dnmt1 which exerts gene repression by histone modifications and DNA methylation, respectively (84).

Finally, the ARIC (Atherosclerosis Risk in Communities) study intended to establish genetic loci associated to ECG global electrical heterogeneity (GEH) and consequently changes in QT measurements and one of the identified loci contained the lncRNA LINC02137 (33). LINC02137 was highly expressed in human heart atrial-appendage region and eQTL analysis showed that variant rs4784934 significantly associated with the expression of LINC02137 and gene NDRG4 in atrial tissue. NDRG4 was reported to be necessary for sodium channel trafficking in the nervous system but also associated with cardiomyopathy (85).

# FUTURE PERSPECTIVES OF LNCRNA GENETIC VARIANTS

Determination of potential causality among genetic variants associated with cardiovascular and cardiometabolic diseases remains a challenging future task. In the case of the functional analysis of lncRNAs it is important to consider their low expression levels and high degree of tissue and cell type specificity. For example, tissue-specific expression quantitative trait loci (eQTL) analysis of lncRNAs is a strong tool to associate certain variants to downstream effectors. Genotype-Tissue Expression (GTEx) project provides the possibility to study tissue-specific gene expression and regulation on large scale with 44 various tissues in 449 individuals, which allowed to build up a resourceful platform in order to identify genetic associations both for local (cis eQTLs) and distal (trans eQTLs) effects (86). Nonetheless, it is relevant to indicate some limitations inherent to this analysis tool such as the inability to detect small size effect eQTLs due to multiple test burden, or the fact that eQTL effects are strongly tissue specific which hinders the inference of functionality and therefore caution must be taken to extrapolate conclusions to other tissues.

Novel lncRNA were localized near leukocyte enhancers and close to GWAS identified risk variants for autoimmune diseases suggesting alterations in enhancers or super enhancers might be associated to changes in phenotype and disease risk (87). SNP in close proximity or even in far distance (e.g., in trans location to the variant), may help unravel the complex regulatory events of cardiovascular disease including underlying importance of enhancers or super-enhancers (88). Yet, the term "super-enhancer" is under debate since a clear definition has not been established and their functional properties do not necessarily set them apart from regular enhancers (89). Another task for future studies is to determine the role of lncRNAs and their genetic variants in the maintenance and remodeling of the chromatin structure that drives interactions between enhancers and transcription initiation sites. Chromosome Conformation Capture (C3) technologies such as HiC (90, 91) or chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) (92) will be useful as genome-wide approaches to study chromatin structural changes and to define the impact of genetic variants in long-range chromatin interactomes.

The advent of new sequencing technologies that improve current throughput, length of reads and cost will increase the number of annotated lncRNAs and help to define their complex transcript models. One of such technologies is capture longread sequencing (CLS), a technique that uses lncRNA capture enrichment with nanopore technology, which allows sequencing of longer fragments (∼1.5 kb) for characterizing the lncRNA structure (93). This highly promising approach would greatly improve the task of defining exon connectivity and therefore splicing transcript models.

Another feature to improve is our ability to predict and characterize lncRNA structural motifs and their underlying functional domains. Computational analysis approaches are able to predict the formation of loops and simple helices but are not so successful to define more complex motifs (94). New high-throughput techniques based on new generation sequencing (NGS) technologies emerged to define new motifs and validate computational predictions in a genome-wide scale (94). These methods use diverse RNA nucleases (ssRNA or dsRNA) or chemical probes in combination with NGS to analyze full transcriptomes in techniques such as Parallel Analysis of RNA Structure (PARS) (95), Fragmentation Sequencing (96) or Selective 2 ′ hydroxyl acylation analyzed by primer extension (SHAPE) (97, 98). For a detailed functional characterization of lncRNAs, novel identify structural domains should be linked to interactome information that can be obtained with novel technologies such as ChIRP (99) and CHART (100). These techniques allow the identification of specific lncRNA interacting partners such as RBPs and can also delimit the interaction sites to specific domains within the RNA molecule.

Lastly, it will be relevant to understand the potential regulatory effects that genetic variants within lncRNA have on regulation of CpG islands in cardiometabolic disorders (32). In fact, an integrative analysis of 11 human data sets generated a reference human epigenome as a framework to characterize GWAS variants that alter the epigenomic profile during complex human diseases (101), which can be also used to profile the non-coding genome.

In summary, in the post-GWAS era many relevant factors must be considered in order to study the effect of genetic variation in lncRNA, some of which comprise differential tissue expression, splicing isoforms models, RNA structural prediction and functional domain identification, and identification of lncRNA interacting partners such as RBPs. The high proportion of disease-associated SNPs lying in non-coding regions highlighted their functional relevance and prompted a better understanding of lncRNA biology as well as regulatory regions such as enhancer to unravel their potential role in cardiometabolic diseases. The expansion of the GWAS field to explore the functionality of lncRNA but also other noncoding RNAs will provide potential novel regulatory causal mechanisms of cardiovascular disease. This research area warrants interesting new insights into underlying mechanisms that determine the genetic component of human disease and will clear the path toward a personalized medicine approach.

### REFERENCES


### AUTHOR CONTRIBUTIONS

AK and HG screened the literature on the topic, drafted, wrote and revised the article. UL revised the article.

### ACKNOWLEDGMENTS

We acknowledge support from the German Research Foundation (DFG) and the Open Access Publication Fund of Charité— Universitätsmedizin Berlin.


tightly linked SNPs in the ANRIL locus on chromosome 9p. Hum Mol Genet. (2008) 17:806–14. doi: 10.1093/hmg/ddm352


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer AS declared a shared affiliation, with no collaboration, with the authors to the handling Editor.

Copyright © 2018 Giral, Landmesser and Kratzer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.