Relevance of Multi-Omics Studies in Cardiovascular Diseases

Cardiovascular diseases are the leading cause of death around the world. Despite the larger number of genes and loci identified, the precise mechanisms by which these genes influence risk of cardiovascular disease is not well understood. Recent advances in the development and optimization of high-throughput technologies for the generation of “omics data” have provided a deeper understanding of the processes and dynamic interactions involved in human diseases. However, the integrative analysis of “omics” data is not straightforward and represents several logistic and computational challenges. In spite of these difficulties, several studies have successfully applied integrative genomics approaches for the investigation of novel mechanisms and plasma biomarkers involved in cardiovascular diseases. In this review, we summarized recent studies aimed to understand the molecular framework of these diseases using multi-omics data from mice and humans. We discuss examples of omics studies for cardiovascular diseases focused on the integration of genomics, epigenomics, transcriptomics, and proteomics. This review also describes current gaps in the study of complex diseases using systems genetics approaches as well as potential limitations and future directions of this emerging field.


INTRODUCTION
Coronary artery disease (CAD) is the most common cause of cardiovascular death (1). Studies conducted in twins (2,3) and in the general population have estimated a heritability of CAD at ∼40-50% (4). In addition, genome-wide association studies (GWAS) have identified more than 150 genetic loci associated with CAD risk (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18). Although GWAS studies have been successful on identifying common DNA variation implicated in cardiovascular diseases, they provide little or no molecular evidence of gene causality. In this context, the premise that rare genetic variation could have stronger functional effects on disease manifestation still is arguable (19). This realization has motivated researchers to integrate genetics studies with additional high-throughput data designed to interrogate the transcriptome, epigenome, proteome, metabolome, etc.

MULTI-OMICS STUDIES FOR THE INVESTIGATION OF CARDIOVASCULAR DISEASE
The simultaneous integration of multi-omics approaches including but not limited to genomics, epigenomics, transcriptomics, proteomics, and metabolomics (Figure 1), represents a powerful approach for understanding the mechanisms connecting identified genetic variation to cardiovascular diseases with gene causality, where many sources of variability are integrated into statistical models to identify key drivers and pathways that have the largest contribution to the disease (25). Importantly, most of the risk variants associated with CAD or other cardiovascular diseases (5,7,14,17,18,37,55,56) identified by GWAS are located in noncoding regions of the genome (intronic or intergenic), suggesting that these variants are likely to affect cis or trans regulatory elements that bind transcription factors, enhancers or promoters (57). Previous multi-omic studies for CAD were mainly focused on the integration of GWAS data with global transcriptomics using eQTL analysis. In recent years, high-throughput technology have further facilitated the integration of omics data for the identification of causal genes and molecular mechanisms involved in the development of cardiovascular events in mice (13,37,39,41,58) and humans (36)(37)(38)(39)48) (Table 1).

SUCCESS STORIES OF MULTI-OMICS STUDIES IN CARDIOVASCULAR DISEASES
Although there have been few studies integrating multiomics profiles for the investigation of mechanisms associated with cardiovascular diseases, this approach has revealed the potential function of previously identified GWAS loci and respective mechanisms involved in these common diseases.
In this section, we summarize recent studies using multiomics approaches focusing on the integration of genomics, epigenomics, transcriptomics, and proteomics.

Genomics, Transcriptomics, and Epigenomics
There is a large body of literature linking genetic variation with gene expression and/or epigenetic marks to understand the potential mechanisms of identified DNA variants in disease manifestation. One example on the integration of genomics with transcriptomics is a study conducted to investigate the role of the 9p21 locus (63), which was identified as one of the most significant loci for CAD in previous GWAs (64,65). The association of CAD with this locus have been consistently replicated in multiple studies (56,66), although the causal link of this locus remained unclear. This locus contains several genes including CDKN2A (encoding cyclin p14, p16), CDKN2B (encoding cyclin p15), MTAP (encoding methylthioadenosine phosphorylase), and the long non-coding RNA ANRIL. Integration of genetic and transcriptomic data led to the identification of ANRIL as the top candidate causal gene for CAD at the 9p21 region (63). Functional studies in cell lines showed possible mechanisms that could explain the role of 9p21 in CAD (67,68). For instance, a previous study showed that alleles at the 9p21 locus were associated with different isoforms of ANRIL (linear or circular isoforms), where linear transcripts were associated with atherosclerosis and circular transcripts were protective against atherosclerosis. This process is mediated through the expression of multiple genes regulated in both, cis and trans (69,70). Moreover, a recent study showed that ANRIL (DQ485454) is involved in endothelial cells functions important to the development of CAD including monocyte adhesion to endothelial cells, trans-endothelial monocyte migration, and endothelial cell migration (71).
Another example is the investigation of the region of the gene cluster CELSR2-PSRC1-MYBPHL-SORT at the 1p13.3 locus associated with low-density lipoprotein cholesterol (LDL-C) levels and cardiovascular risk (55,72,73). Incorporation of eQTL analysis also showed that SNPs associated with a lower risk of CAD in the 1p13.3 locus were associated with an increased gene expression of SORT1, PSRC1, and CELSR2, with SORT1 displaying the largest expression change in the liver (73,74). This finding allowed the construction of new hypothesis to elucidate the molecular mechanism of the 1p13.3 locus on CAD development. Studies of SORT1 and PSRC1 overexpression in mouse models of hyperlipidemia showed that, while PSCR1 overexpression had no metabolic effects, SORT1 overexpression led to a significant reduction in plasma LDL-C and very lowdensity lipoprotein (VLDL) particle levels by modulating hepatic VLDL secretion, suggesting an important role of SORT1 in CAD (74). Finally, a similar omics approach was applied to identify genes associated with isoproterenol-induced hypertrophy and heart failure in the Hybrid Mouse Diversity Panel (HMDP) (13,22,23,41,(75)(76)(77)(78)(79)(80)(81)(82)(83). The integration of genomic information and cardiac transcriptome enabled the identification of several candidate causal genes that determined the degree of cardiac FIGURE 1 | Multi-omics approach to identify the causal gene associated with LDL-C levels and CAD risk at the 1p13 locus. (A) GWAs meta-analysis showed several SNPs at the 1p13 locus strongly associated with LDL-C levels (p = 1.0 × 10 −170 ) and CAD risk. The 1p13 locus contains several genes (squares). The most significantly associated haplotype for LDL-C comprise six SNPs in high linkage disequilibrium (LD) and is located between CELSR1 and PSR1 genes. (B) Liver eQTL analysis showed the minor haplotype significantly associated with higher expression of CELSR1, PSR1, and SORT1 genes with SORT1 gene showed the largest difference modified from Musunuru et al. (74). (C) By using luciferase assays and ENCODE database it was identified a common polymorphism at the 1p13 locus, rs12740374 that alters the expression of the SORT1 gene in liver with the minor allele (T) creating a C/EBP (CCAAT/enhancer binding protein) transcription factor binding site and the major allele (G) disrupting it. The C/EBP transcriptional factor regulates the expression of hepatic genes involved in metabolism. (D) Functional approaches for SORT1 using small interfering RNA (siRNA) knockdown and viral overexpression in mouse liver showed that SORT1 results in significant changes in plasma LDL-C and very low-density lipoprotein (VLDL) particle levels by modulating hepatic VLDL secretion.
hypertrophy. Specifically, Hes1 was predicted to be involved in the progression of heart damage in cardiac hypertrophy (13). This study showed that knocking down Hes1 in ventricular myocytes resulted in a reduction of up to 90% hypertrophy, confirming the role of Hes1 in cardiac hypertrophy (13). More recently, several studies have demonstrated that epigenetic modifications are associated with CAD risk (38,42,43,47,49,59,61,62,84,85), and other CVD related risk factors (61,62,84). Epigenetic changes that have been investigated in the context of CVD include DNA methylation (38,43,49), chromatin organization (42), and microRNAs (47). In recent years, efforts have been conducted to identify interactions between functional non-coding active elements of the genome and enhancers, defined as cis-acting DNA sequences that can increase the transcription of genes (60,61,86). Several methods have been developed for the identification of these interactions including, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq), chromatin conformation capture (3C, HiC), and most recently, chromatin interaction paired-end tagging (ChIA-PET). These technologies offer the advantage to identify genome-wide protein-DNA interactions.

Adding Another Layer: Proteomics
The incorporation of protein expression profiles into the multiomics studies for CAD has been less explored compared with multi-omics studies incorporating mRNA expression (43-45, 47, 51-54). This may be due to the costs and the highly specialized expertise required for instrument operation, data acquisition,    both CAD and  T2D   ACAT2, ACLY, CAV1, COL6A2,  COX7A2, DBI, HMGCR, IDI1, IGF1 Frontiers in Cardiovascular Medicine | www.frontiersin.org and analysis of quantitative proteomics (87). Recently, Emilsson et al. showed that co-expression protein modules associated with complex diseases are highly regulated by cis and trans acting genetic variants (88). Therefore, the integration of proteomic data can add valuable information about the molecular processes involved in the development of CAD. One of the more interesting studies incorporating proteomic data in mice was conducted by Lau et al. which in addition to genomic and proteomic data, integrated protein dynamics (51). This study showed modules involved in cell adhesion, glycolytic process, actin filament organization, translation, and sodium ion transport associated with heart hypertrophy (51). In another multi-omics study conducted by Schlotter et al. for the identification of mechanisms involved in calcified aortic valve disease (CAVD) (52), the authors performed global transcriptomics and proteomics of human stenotic valves to identified novel regulatory networks in CAVD. Novel potential molecular drivers of CAVD development and progression were identified including alkaline phosphatase, apolipoprotein B, matrix metalloproteinase activation, and mitogen-activated protein kinase. Moreover, this approach also identified inflammation pathways as a significant contributor to CVD (52). This study emphasizes the relevance of extensive phenotypic characterization for multi-omics approaches to define markers associated with disease subgroups and to design more specific therapeutic strategies. In summary, these studies showed that the knowledge generated from the integration of genomics, epigenomics, transcriptomics, and proteomics could provide initial insights into the identification of mechanisms for cardiovascular diseases.

METABOLOMICS AND METAGENOMIC STUDIES FOR THE STUDY OF CAD
Metabolomics and metagenomics represent additional layers of complexity because they integrate the influences of the intake, utilization and flux of nutrients. Moreover, these omics data have proven to be useful tools for the identification of biomarkers with potential clinical applicability (89). However, studies integrating metabolomics, lipidomics, or metagenomics data in the context of CAD are limited ( Table 1). In a GWAS study for metabolite levels conducted by Suhre et al. (35), the authors found several loci including ABO, NAT2, CPS1, NAT8, ALPL, KLKB1 genes associated with both metabolites and a high risk of CAD (35). Interestingly, KLKB1 was associated with bradykinin concentrations and with a higher CAD risk. It is known that bradykinin is a potent endotheliumdependent vasodilator that contributes to vasodilation and hypotension (90). These findings suggest that the integration of metabolomic data with other omic data can help to identify novel biomarkers for CVD diagnosis. Regarding studies integrating metagenomic data, there are only two studies for CVD so far that integrate metabolomics and metagenomics data (40,50) (Table 1). These studies have shown species of bacteria associated with risk of CAD and plasma metabolites. For example, the bacteria Veillonella was associated with chronic heart failure and was also inversely correlated with known cardiovascular protective metabolites such as niacin, cinnamic acid and orotic acid (50). Nevertheless, it should be noted that these studies are only based on correlations and do not make an integrative analysis of the data, which reflects the complexity and the opportunity to develop novel statistical approaches.

INTEGRATION OF MULTI-OMICS, MULTI-ETHNIC, AND MULTI-SPECIES MODELS OF DISEASE
It has been suggested that comparison of "omics" data between human and animal models can provide an important contribution to the understanding of the molecular mechanism implicated in CAD (24). While studies in humans have greater translational potential, studies using animal models can help validate their biological relevance and to recapitulate the findings in humans under different environmental stimulus (22,24,78). This has been demonstrated in recent studies integrating multi-omics approaches for the study of CAD in both humans and animal models (39,41). An example of a largescale integrative multi-omic approach is the study conducted by Shu and colleagues that involved CAD and T2D GWAS data of five multi-ethnic studies (41). In this study, genetic and transcriptomic data of 16 relevant tissues for CAD were included to construct co-regulation networks for CVD and T2D (41). This network modeling allowed the identification of pathways involved in lipid metabolism, glucose, and branchedchain amino acids, along with process involved in oxidation, extracellular matrix, immune response, and neuronal system in CAD and T2D (41). Moreover, this strategy helped to dissect the molecular mechanism of HMGCR, identified as a top key driver for both CAD and T2D. Interestingly, the authors showed that HMGCR was associated with CVD and T2D in opposite directions, while genetic variants in HMGCR decrease CVD risk, they increase T2D risk. These findings could have important implications in the pharmacological treatment of both diseases. The integration of existing omics-data from mice and humans deposited in the cardiovascular disease database (C/VDdb), including, microRNA, genomics, proteomics and metabolomics, has recently been analyzed to identified novel drivers for CVD. In an exercise to demonstrate the utility of the C/VD database, integrative analysis of this "omics" studies showed enrichment of lipid metabolism, extracellular matrix remodeling, inflammation, and cardiac hypertrophy pathways. In addition, regulatory mechanisms mediated through miRNAs associated with the development of CAD were reported (47). Altogether, these studies illustrate that high-level integration approaches are powerful tools to extract robust biological signals across molecular layers, phenotypes, tissue types, and even species and to prioritize new therapeutic avenues for cardiometabolic diseases. Of note, there is a limited overlap in the metabolic regulators, co-expression modules and key driver gene identified across different multi-omics studies for CVD, except for markers involved in lipid metabolism which seem to be consistent among different studies. This highlights the importance of lipid metabolism in the development of cardiovascular disorders (91)(92)(93). Discrepancies of these findings could be explained by differences in the statistical tools, phenotypic characterization, ethnic origin, sex, and pathophysiological conditions (13, 23-25, 79, 94).

DATA INTEGRATION USING FREELY AVAILABLE PUBLIC DATABASES
The access to big biologic public databases allows the integration of genomic data with other "omics" including transcriptomics, proteomics and metabolomics datasets through freely available public databases such as GTEx (95) Encode (Encode project c, Roadmap (Roadmap Epigenomics Consortium, 2015), Snyderome (96) and bioRxiv, to mention a few. One of the main advantages of these databases is that allow simultaneous analysis of regulatory mechanism in different tissues, which are usually difficult to obtain in genetic studies conducted in humans. In this regard, the Genotype-Tissue Expression (GTEx) project is one of the most complete gene expression datasets currently available. This database was generated as a repository for identifying genetic variants associated with changes in gene expression (expression quantitative trait loci, eQTLs) and contains a broad tissue collection obtained from deceased donors. The last release v7, provides 11,688 transcriptomes from 714 individuals and 53 tissues. In addition GTEx also includes pathology and histology data as well as other characteristics as ethnicity, age, and sex (95). Moreover, in order to increase information about potential molecular mechanisms, the Enhancing GTEx (eGTEx) project extends the GTEx project to combine gene expression with DNase I hypersensitivity, ChIP-seq, DNA and RNA methylation, ASE, protein expression, somatic mutation, and telomere length assays (97). The Encyclopedia of DNA Elements (ENCODE) project has identified and annotated a significant amount of functional elements in the human and mice genome through diverse approaches as DNA hypersensitivity, DNA methylation, and immunoprecipitation (IP) assays of proteins that interact with DNA and RNA. The last version includes over 35 high-throughput experimental methods in > 250 different cell and tissue types, resulting in over 4,000 experiments. As GTEx database, ENCODE also includes relevant information about ethnicity, sex and age (98). Additional databases such as Roadmap (99), which has an extensive collection of DNA methylation, histone modifications, chromatin accessibility, and small RNA transcripts. The utility of these databases has been demonstrated in several studies for CAD, where their integration with genetic data facilitated the identification of regulatory mechanisms, potential targets and allows the functional validation. One example, is the prediction of the disruption of C/EBP binding site by the G allele of rs12740374 SNP using ENCODE data, functional studies showed that this variant results in a lower transcription of the SORT1 gene in liver and a higher VLDL-secretion, explaining the association of the variant with LDL-C levels in genetic studies (Figure 1) (74). Therefore, the integration of various data frameworks could be highly successfully to understand the mechanisms implicated in disease manifestation.

FUTURE DIRECTIONS
The identification of causal genes is a critical step toward the translation of genetic loci into biologic processes. The integration of "omic" strategies will accelerate the identification, in a more precise way, of novel molecular mechanisms implicated in CVD. This may eventually result in the characterization of novel pathways and drug targets. Although multi-omics approaches have been successfully applied for the investigation of cardiovascular diseases, the number of studies using this approach is still limited. These studies have been primarily focused on the integration of genomics, transcriptomics, epigenomics, and proteomics. Given the potential of metabolomics, metatranscriptomics, and metagenomics as tools for the identification of biomarkers with potential clinical applicability, the integration of such data will increase the understanding of cardiovascular diseases and accelerate the identification of new diagnostics or therapeutic targets (100). Finally, research efforts should be directed to the application of multi-omics and the generation of big data in more diverse populations and into the investigation of sex-specific mechanisms.

AUTHOR CONTRIBUTIONS
PL-M, JW, and AH-V drafted and edited the manuscript.

FUNDING
AH-V is funded by the NIH U54 DK120342 grant and NIH/CTSI UL1 TR00188. JW is funded by the NIH K08 HL133491 and NIH R01 HL129639.