Deriving Gene Networks from SNP Associated with Triacylglycerol and Phospholipid Fatty Acid Fractions from Ribeyes of Angus Cattle

The fatty acid profile of beef is a complex trait that can benefit from gene-interaction network analysis to understand relationships among loci that contribute to phenotypic variation. Phenotypic measures of fatty acid profile from triacylglycerol and phospholipid fractions of longissimus muscle, pedigree information, and Illumina 54 k bovine SNP genotypes were utilized to derive an annotated gene network associated with fatty acid composition in 1,833 Angus beef cattle. The Bayes-B statistical model was utilized to perform a genome wide association study to estimate associations between 54 k SNP genotypes and 39 individual fatty acid phenotypes within each fraction. Posterior means of the effects were estimated for each of the 54 k SNP and for the collective effects of all the SNP in every 1-Mb genomic window in terms of the proportion of genetic variance explained by the window. Windows that explained the largest proportions of genetic variance for individual lipids were found in the triacylglycerol fraction. There was almost no overlap in the genomic regions explaining variance between the triacylglycerol and phospholipid fractions. Partial correlations were used to identify correlated regions of the genome for the set of largest 1 Mb windows that explained up to 35% genetic variation in either fatty acid fraction. SNP were allocated to windows based on the bovine UMD3.1 assembly. Gene network clusters were generated utilizing a partial correlation and information theory algorithm. Results were used in conjunction with network scoring and visualization software to analyze correlated SNP across 39 fatty acid phenotypes to identify SNP of significance. Significant pathways implicated in fatty acid metabolism through GO term enrichment analysis included homeostasis of number of cells, homeostatic process, coenzyme/cofactor activity, and immunoglobulin. These results suggest different metabolic pathways regulate the development of different types of lipids found in bovine muscle tissues. Network analysis using partial correlations and annotation of significant SNPs can yield information about the genetic architecture of complex traits.


INTRODUCTION
Beef is a nutritious source of protein, fat, vitamins, and minerals when appropriately included in the human diet. A large body of research indicates it is critical to maintain a balanced fatty acid intake to support a healthy blood lipid profile (Ooi et al., 2013). The synthesis of adipose tissue in beef cattle is a complex biological process controlled by numerous loci as well as environmental factors. Considerable ongoing effort has been devoted to identification of genomic regions as well as candidate genes for fatty acid profile and adipose synthesis in various breeds of beef cattle, including Angus, Nellore, Brahman, Santa Gertrudis, Hereford, and Shorthorn (Barendse, 2011;Cesar et al., 2014;Kelly et al., 2014). The usefulness of these loci in DNA based beef cattle selection schemes will increase with knowledge of the genomic regions contributing to the development fat deposition (Saatchi et al., 2013).
The fatty acid profile in beef cattle can be characterized by the abundance of 39 individual lipids of varying chain lengths and degrees of saturation (Daley et al., 2010). The total lipids present in animal tissues can be separated into triacylglycerol and phospholipid fractions, which represent the two primary modes of lipid storage in cattle muscular tissue (Yen et al., 2008). The triacylglycerol fraction typically represents 70 to 92% of the total lipid in longissimus muscle depending on age and dietary composition (Warren et al., 2008). The most abundant fatty acids in the triacylglycerol fraction are 18:1, 16:0, and 18:0 (Daley et al., 2010). The fatty acids are primarily derived from the de novo synthesis of lipids from the fatty acid synthase (FASN) complex. The most abundant fatty acids in the phospholipid fraction include 18:2, 16:0, and 18:1, with the majority of lipids having at least one unsaturation. Less is known about the origins of these lipid species in phospholipid. Taken together, these data present a large number of phenotypes which are regulated by various networks of genes associated with lipid synthesis and metabolism. Given the increasing availability of SNP data, there has been an interest in developing computational methods that utilize GWAS results from multiple traits along with principles of co-association to identify clusters of SNP that likely regulate the underlying metabolic pathways (Reverter and Chan, 2008;Fortes et al., 2011;Reverter and Fortes, 2013).
Phenotypic correlations among fatty acids within and across lipid fractions have been published (Hoehne et al., 2012), while estimates of phenotypic variance and genetic parameters including correlations have been obtained for the data set used in this study (Buchanan et al., 2015). Following Reverter and Fortes (2013), correlations among multiple phenotypes can be exploited to develop an association weight matrix utilizing highthroughput data such as that obtained from GWAS in order to reveal genomic regions associated with a phenotype or multiple phenotypes of interest. This methodology utilizes a partial correlation and information theory algorithm (PCIT) to analyze an input matrix that contains data from GWAS across multiple phenotypes to generate clusters of loci that are highly associated with the overall trait of interest (Reverter and Chan, 2008). The utility of that analysis has been previously demonstrated through derivation of a regulatory gene network associated with puberty in beef cattle (Fortes et al., 2011). The objective of this study was to utilize the PCIT algorithm to identify clusters of genes affecting variation across 39 different lipid classes from both the triacylglycerol and phospholipid fatty acid fractions in Angus beef cattle.

Ethics Statement
The Iowa State University and Oklahoma State University Institutional Review Boards approved the experimental protocols used in this study.

Animals and Sample Collection
A total of 1,833 offspring of 155 Angus bulls represented by bulls (n = 450), steers (n = 1,022), and heifers (n = 361) were used in this study. All cattle were finished on concentrate diets typical for feedlot growth in Iowa (n = 908), California (n = 344), Colorado (n = 291), or Texas (n = 290). Animals were harvested at commercial facilities when they reached typical US market endpoints with an average age of 457 ± 46 days. Production characteristics and additional details of sample collection and preparation were reported previously (Garmyn et al., 2011). After external fat and connective tissue were removed, the 1.27-cm steaks from the longissimus muscle were analyzed for fatty acid composition at Iowa State University (Ames, IA, USA). Fatty acids were analyzed using thin layer and gas chromatography to separate the lipid fractions and derivatize to fatty acid methyl esters, respectively. Full details of fatty acid preparation and determination were previously described in detail (Zhang et al., 2008;Buchanan et al., 2015).

Genome-Wide Association Study of Fatty Acid Fractions
Genomic DNA was extracted from the ground beef sample used for fatty acid composition and was genotyped with the Bovine SNP50 Infinium II BeadChip (Illumina, San Diego, CA, USA). Contemporary groups were defined based on crossclassifications of gender at harvest (bull, steer or heifer), finishing location (California, Colorado, Iowa, Texas), and harvest date, for a total of 33 groups. Contemporary groups were fit as fixed effects in genomic analyses. Effects of SNP on each trait were estimated using the Bayes-B option of GenSel accessed through the BIGSGUI Version 0.9.2 (Kizilkaya et al., 2010). The Markov-chain Monte Carlo approach used to estimate the effect of each SNP involved a 1,000 iteration burn-in period followed by 40,000 iterations used to obtain posterior means of the effect of each SNP. The estimate of the proportion of genetic variation explained by each SNP and each 1 Mb window was obtained for all 39 fatty acid phenotypes for triacylglycerol and phospholipid fractions as 2p(1-p) times the square of the mean effect for that locus 1 . The windowBV yes option was used to form the posterior distribution of genetic variance explained by every 1 Mb window (Garrick and Fernando, 2013).

Correlations among SNP
Significant pair-wise correlations among SNP effects were calculated across the 39 fatty acid phenotypes using an optimized implementation of the PCIT algorithm. Briefly, this algorithm begins by calculating third-order partial correlation coefficients among all possible trios of SNPs. Within each trio, a direct correlation between any pair of SNPs that does not exhibit a significant partial correlation to the third SNP is considered an independent association, and thus not significant for network co-association. This methodology identifies SNP co-associations that occur more frequently over the range of phenotypes given as input. The full mathematical model for identifying the most co-associated SNPs across multiple phenotypes is explained in detail by Reverter and Chan (2008). The algorithm was optimized to run in parallel at the Texas Advanced Computing Center (Koesterke et al., 2013). For the triacylglycerol fraction an initial set of 454 SNP were selected from the 20 genomic windows of size 1 Mb with the largest posterior probability of association (Garrick and Fernando, 2013) from the 16:0 fatty acid phenotype. The posterior mean SNP effects for these 454 SNP from 16:0 was augmented with the effects from the other 38 fatty acid phenotypes. This 454 × 39 matrix of posterior mean SNP effects was used as the input for the PCIT algorithm to detect co-association of effects for any SNP across multiple fatty acids. All SNP pairs within the matrix were tested for a partial correlation with at least one other SNP in order to create clusters of associated genes. SNP pairs without a significant partial correlation to at least one other SNP were removed from the dataset and not used for subsequent network association analysis since they would appear isolated.
All SNP were selected from 20 genomic windows of size 1 Mb with the largest posterior probability of association from the 16:0 fatty acid phenotype in order to build a matrix of SNP effects for the phospholipid fraction. A vector of posterior mean SNP effects for 418 SNP from 16:0 was augmented with the effects estimated for each of the other 38 fatty acids. PCIT network creation and visualization proceeded identically to the methods described for the triacylglycerol fraction.
Correlations among SNP were used to construct networks of SNP that exhibited common effects across multiple fatty acids. Correlation between SNP pairs with a non-zero partial correlation to another SNP were input into Cytoscape 3.0.2 (Shannon et al., 2003) software to visualize gene network clusters using the MCODE plugin (Bader and Hogue, 2003;Saito et al., 2012). Networks are scored and ranked by the MCODE plugin as network density times the number of nodes. MCODE defines network density as the number of edges in a network divided by the theoretical maximum number of edges. SNP were annotated with the Variant Effect Predictor (VEP) using Bovine UMD 3.1 annotations (McLaren et al., 2010) accessed from the cattle genome analysis data repository (Koltes, 2012).

Gene Ontology Enrichment Analysis and Visualization
Gene ontology (GO) enrichment was carried out using the DAVID v6.7 Functional Annotation Tool (Huang da et al., 2009a,b) in order to identify biological terms associated with genomic regions and gene networks identified in the GWAS analysis. GO term enrichment and clustering was carried out on all annotated genes within the 20 1 Mb genomic windows associated with the triacylglycerol and phospholipid fractions, respectively. Reported results from the GO term enrichment and clustering include the overall enrichment score, the percentage of genes involved in the given term (%), significance of enrichment or EASE score (P-value), fold enrichment (FE), and false discovery rate (FDR). Details of the calculation for each of the parameters associated with the GO term enrichment in DAVID is reported by Huang da et al. (2009b). Ensembl Gene ID's were extracted from 1 Mb genomic regions from the Bos taurus UMD3.1 assembly for use in the GO term enrichment analysis.

Triacylglycerol Genome-Wide Association Study
The posterior mean estimates for proportion of genetic variance explained by 1 Mb genomic windows as well as the posterior probability of association for selected lipids and lipid classes from the triacylglycerol fraction are shown in Table 1. All posterior probabilities of association for the genomic windows displayed in Table 1 were greater than 97% (PPA > 0.97), which indicates a low (<3%) FDR in the model for these windows. This indicates these genomic regions or the regions in neighboring windows likely harbor loci or genomic features that exhibit large effects on the phenotype included in the analysis. Multiple genomic windows were identified which explained between 22.13 and 34.55% of genetic variance for individual lipids found in the triacylglycerol fraction. The genomic window on chromosome 19 at Mb 51 appears to describe a large proportion of the genetic variance across multiple fatty acids in the triacylglycerol fraction, including 14:0, 16:0, 16:1, 18:0, 18:1, saturated fatty acids (SFA), and monounsaturated fatty acids (MUFA). This region harbors the candidate gene FASN, which is known to be associated with primary lipid synthesis in adipose tissue (Zhang et al., 2008;Abe et al., 2009). It follows that this genomic window would explain a large proportion of genetic variance across multiple fatty acids and fatty acid classes due to triacylglycerol functioning as the main storage site for lipids of medium chain length synthesized from FASN .
Other genomic regions of significance that appeared in both this data set that separately considers the two fractions and the total fatty acid analysis presented by Saatchi et al. (2013) include windows on chromosome 29 (18 MB) and a region on chromosome 26 (20-21 Mb). The region on chromosome 29 accounts for up to 10.65% of the genetic variance in 14:0, 16:0, 16:1, 18:0, and 18:1. The region on chromosome 26 accounts For each lipid or lipid class the posterior mean of the genetic variance (σ 2 g ) explained by the 1 Mb window is given along with window position coordinates (UMD3.1), number of SNP, and posterior probability of association (PPA).
for up to 5.39% of the genetic variance in 14:0, 16:1, SFA, and MUFA. These regions also harbor candidate genes related to fatty acid synthesis and metabolism including the transcription factor thyroid responsive hormone (THRSP) and the coenzyme stearoyl co-A desaturase (SCD), respectively, as noted by Saatchi et al. (2013). Both of these genes have been previously associated with lipid metabolism in cattle (Graugnard et al., 2010). Several genomic regions of interest were found to explain relatively large proportions of genetic variance that were not detected in the data presented by Saatchi et al. (2013). A window on chromosome 17 at 16 Mb accounted for 22.91% genetic variance in 16:0. This region harbors the possible candidate gene inositol polyphosphate-4-phosphatase, type II (INPP4B). Cellular localization for INPP4B is in the cytoplasm, and the top Gene Ontology biological process entries for this gene include phospholipid metabolic process, and known associations in bovine indicate a larger role in bone remodeling 2 . No QTL have been reported in the Cattle QTL Database 3 (Hu et al., 2013)  For each lipid or lipid class an estimate of genetic variance (σ 2 g ) explained by the 1 Mb window is given along with window position coordinates, number of SNP, and posterior probability of association (PPA).

Phospholipid Genome-Wide Association Study
The posterior mean estimates for genetic variance explained by 1 Mb genomic windows as well as the posterior probabilities of association for selected lipids and lipid classes from the phospholipid fraction are shown in Table 2. In contrast to the triacylglycerol fraction, posterior estimates for individual lipids and lipid classes were lower, with only 7 windows displaying a posterior probability of association greater than 90% (PPA > 0.90). Windows that explained the majority of genetic variance in lipids from the phospholipid fraction exhibited almost no overlap with those from the triacylglycerol fraction. Several windows identified harbor candidate genes related to overall lipid and phospholipid metabolism. The genomic window on chromosome 16 at 4 Mb that accounted for 1.79% of the genetic variance in 14:0 harbors the candidate gene fructose-2,6biphosphatase 2 (PFKFB2). This gene is known to be involved in synthesis and degradation of fructose-2,6-bisphosphate, which is a regulatory molecule involved in glycolysis in eukaryotes (Hue and Rider, 1987). A QTL that spanned this region was identified previously in Angus in relation to 12th rib fat thickness (McClure FIGURE 1 | Highest scoring annotated SNP network from the triacylglycerol fraction derived from PCIT analysis. Location indicates significance of each node, with distance from the center indicating the number of connections. Each edge represents a correlation identified through PCIT analysis. Frontiers in Genetics | www.frontiersin.org et al., 2010). Also, the genomic window on chromosome 24 at 29 Mb accounted for 4.27% of genetic variance in 18:0 harbors the candidate gene N-cadherin (CDH2). This gene is known to be involved in cell-to-cell adhesion and has been associated with increased adipogenic proliferation in mice (Castro et al., 2004). Other novel genomic windows, including 1 Mb upstream and downstream from the identified regions, did not harbor any candidate genes previously implicated in fatty acid or membrane metabolism.

Triacylglycerol and Phospholipid Gene Networks
Some 394 of the 454 SNP entered into the PCIT for the triacylglycerol fraction were co-localized into 12 separate networks. Information detailing the top 5 network scoring results from the Cytoscape MCODE plugin for each triacylglycerolderived network is in Table 3. The two highest scoring SNP networks for triacylglycerol fraction obtained from PCIT output and subsequent visualization of nodes with Cytoscape are in Figures 1 and 2. Networks are scored and ranked by the MCODE algorithm as network density times the number of nodes. MCODE defines network density as the number of edges in a network divided by the theoretical maximum number of edges. Nodes that were not annotated to a gene or feature were removed from the figures for visual simplicity. Location within the network indicates significance of each node, with greater distance from the center indicating a lower number of overall connections and importance to the phenotype. Each edge represents a connection, or direct correlation, identified through PCIT analysis. Figure 1 displays the highest scoring triacylglycerol annotated sub-network (unannotated markers removed) from scoring with the Cytoscape MCODE plugin. The highest scoring network contained 56 nodes and 1487 edges, or connections. Figure 2 displays the second highest scoring annotated triacylglycerol network obtained with MCODE, which contained 92 nodes and 1699 edges. The clusters of genes represent scored networks derived using PCIT that function as molecular complexes related to the input phenotype. The full output including marker lists for each network from the MCODE analysis for both the triacylglycerol and phospholipid fractions is included in Supplementary Data S1.
Candidate genes involved in fatty acid metabolism found within these networks include THRSP, Acyl-CoA synthetase-5 (ACSL5), glycerol-3-phosphate acyltransferase muscle-type (GPAM), and coiled coil domain-containing 57 (CCD57). The candidate genes THRSP and GPAM have been previously identified as playing a role in lipid metabolism in beef and dairy adipose via the PPAR pathway (Graugnard et al., 2010;Ji et al., 2014;Moisa et al., 2014). The enzyme ACSL5 is found primarily in cells with a high triacylglycerol synthesis activity, which indicates a likely role in development of adipose (Bu and Mashek, 2010). CCD57 is known to be involved in DNA binding and regulation of gene expression. This gene is located next to FASN on chromosome 19, and has been previously associated with 14:0 content and transcripts have been detected in excess of FASN transcripts in second-lactation dairy cattle (Bouwman et al., 2014;Canovas et al., 2014). Overall, this methodology presents strong evidence that biologically relevant genes can be co-localized with a close relationship to triacylglycerol variation and assembly.
Annotated visualizations of the two highest scoring phospholipid networks with Cytoscape are in Figures 3 and 4. The network in Figure 3 is the highest scoring phospholipid network containing 52 nodes and 987 edges. Figure 4 displays the second highest scoring phospholipid network which contained 65 nodes and 1,034 edges. Multiple genes annotated within these networks are known to be involved in cellular trafficking and cell integrity functions associated with the phospholipid membrane. Multiple genes that are known to be involved in membrane binding and cellular trafficking were identified including myosin-IXB (MYO9B), FCH domain only protein 1 (FCHO1), and ADAM metallopeptidase domain (ADAM11). Overall, there were relatively fewer genes identified in the phospholipid fraction analysis that were annotated to genes involved in membrane and lipid metabolism. This difficulty FIGURE 3 | Highest scoring annotated SNP network from the phospholipid fraction derived from PCIT analysis. Location indicates significance of each node, with distance from the center indicating the number of connections. Each edge represents a correlation identified through PCIT analysis.
in identifying associated genes could be partly due to the low variance observed in those phenotypes.

Gene Ontology Term Enrichment Analysis
Gene ontology term enrichment analysis was carried out for all genes located in the top 1 Mb regions shown in Tables 1 and 2 using DAVID Functional Classification Clustering tools. Genes were obtained by extracting ENSEMBL Gene ID features from the regions of interest. Significant results for the DAVID Functional Annotation Clustering and Functional Annotation Chart results for the top GWAS regions for both fractions are in Table 4 through Table 7. The top GO term clusters for the top 1 Mb windows associated with the triacylglycerol fraction are in Table 4. DAVID Functional Annotation Clusters are considered significant above an enrichment score of 1.3. In the functional annotation chart GO terms are considered significantly enriched at a P-value of 0.05 or less (Huang da et al., 2009b). The FDR is provided in the Functional Annotation Chart to determine emphasis to be placed on terms considered significant through the P-value statistic.
The full list of GO terms is displayed in the Functional Annotation Chart results in Table 5. Two clusters were produced for the triacylglycerol fraction with an enrichment score above 1.3. Significant GO terms featured in the Functional Clusters and Functional Annotation Chart included homeostatic process (GO:0042592), homeostasis of number of cells (GO:0048872), dendrite development (GO:0016358), and activation of MAP kinase activity (GO:0000187). These terms appear to be associated with features relating to cellularity and energy homeostasis pathways, which have relevance to fatty acid deposition and adipogenesis. The link between adipose tissue cellularity and fatty acid profile has been previously established (Costa et al., 2012). These results highlight the FIGURE 4 | Second highest scoring annotated SNP network from the phospholipid fraction derived from PCIT analysis. Location indicates significance of each node, with distance from the center indicating the number of connections. Each edge represents a correlation identified through PCIT analysis.
genetic involvement of cellular homeostasis in triacylglycerol metabolism.
Functional Annotation Clustering analysis for the top GWAS regions associated with phospholipid is in Table 6 and the full list of significant GO terms from the Functional Annotation Chart is in Table 7. Only one significant GO term cluster was identified for phospholipid regions with an annotation cluster score above 1.3. Top significant enriched GO terms included protein serine/threonine kinase activity (GO:0004674), immunoglobulin mediated immune response (GO:0002455), and plasma protein inflammatory response (GO:0002541). There is evidence for a link between prolonged protein kinase activation and a cellular signaling cascade that may result in the degradation of lipid membrane constituents (Nishizuka, 1995). However, without further evidence it is not immediately apparent how this term enrichment relates to lipid membrane metabolism. There also appear to be a large number of enriched terms associated with immune response pathways, suggesting a possible role in unsaturated lipid signaling in these processes. Statistics associated with GO terms include percentage of genes involved in the given term (%), significance of enrichment or EASE score (P-value), and fold enrichment (FE). Statistics associated with GO terms include percentage of genes involved in the given term (%), significance of enrichment or EASE score (P-value), fold enrichment (FE), and false discovery rate (FDR).

CONCLUSION
Analysis of GWAS results for triacylglycerol and phospholipid fractions supports the conclusion that the triacylglycerol fraction is closely representative of the total fatty acid fraction.
Significant genomic windows identified for triacylglycerol fraction overlapped with the results previously presented by (Saatchi et al., 2013) based on total fatty acids in the same samples. This is expected since the phenotypic measurements of the predominant triacylglycerol fraction are similar to those Statistics associated with GO terms include percentage of genes involved in the given term (%), significance of enrichment or EASE score (P-value), and fold enrichment (FE). Statistics associated with GO terms include percentage of genes involved in the given term (%), significance of enrichment or EASE score (P-value), fold enrichment (FE), and false discovery rate (FDR).
for the total fatty acids. It follows that the genomic regions that likely harbor genes and features related to the total fatty acids would also be identified in the triacylglycerol. The triacylglycerol fraction also exhibits a much larger genetic variance when compared to the phospholipid fraction. This methodology can yield candidate markers associated with intramuscular adipose accumulation as well as an enriched set of biological functions representative of fatty acid deposition in beef cattle. An analysis of the genomic regions that affect the phospholipid fraction yielded few genes with a known biological association to lipid metabolism. Significant genomic regions identified explained lower percentages of genetic variance in comparison to the triacylglycerol. The low variation observed in the phospholipid fraction is likely due to the importance of the phospholipid membrane to biological function of the cell. Pathways prevalent in the phospholipid analysis appeared to be highly related to cell-to-cell adhesion, cellular trafficking, and coenzyme/cofactor activity. A larger dataset could possibly improve results when dealing with traits that exhibit low phenotypic variance. In conclusion, the combination of GWAS results with PCIT and network visualization represents a robust methodology for identifying candidate genes of interest for traits with multiple phenotypes and adequate phenotypic variance.

AUTHOR CONTRIBUTIONS
JB conducted the analysis and drafted the manuscript. JR conceived the analysis and assisted with the manuscript. DG assisted with the analysis and manuscript. QD assisted with data collection and analysis. DB assisted with he analysis and the manuscript. JK conceived the analysis and assisted with the manuscript. MS conceived the analysis and assisted with the manuscript. LK assisted with the methodology and and analysis. RM conceived the analysis and assisted with the manuscript.

FUNDING
This research was supported by Pfizer Animal Genetics.