Using a Two-Sample Mendelian Randomization Method in Assessing the Causal Relationships Between Human Blood Metabolites and Heart Failure

Background: Heart failure (HF) is the main cause of morbidity and mortality worldwide, and metabolic dysfunction is an important factor related to HF pathogenesis and development. However, the causal effect of blood metabolites on HF remains unclear. Objectives: Our chief aim is to investigate the causal relationships between human blood metabolites and HF risk. Methods: We used an unbiased two-sample Mendelian randomization (MR) approach to assess the causal relationships between 486 human blood metabolites and HF risk. Exposure information was obtained from Sample 1, which is the largest metabolome-based genome-wide association study (mGWAS) data containing 7,824 Europeans. Outcome information was obtained from Sample 2, which is based on the results of a large-scale GWAS meta-analysis of HF and contains 47,309 cases and 930,014 controls of Europeans. The inverse variance weighted (IVW) model was used as the primary two-sample MR analysis method and followed the sensitivity analyses, including heterogeneity test, horizontal pleiotropy test, and leave-one-out analysis. Results: We observed that 11 known metabolites were potentially related to the risk of HF after using the IVW method (P < 0.05). After adding another four MR models and performing sensitivity analyses, we found a 1-SD increase in the xenobiotics 4-vinylphenol sulfate was associated with ~22% higher risk of HF (OR [95%CI], 1.22 [1.07–1.38]). Conclusions: We revealed that the 4-vinylphenol sulfate may nominally increase the risk of HF by 22% after using a two-sample MR approach. Our findings may provide novel insights into the pathogenesis underlying HF and novel strategies for HF prevention.


INTRODUCTION
Heart failure (HF) is a major public health problem and has imposed considerable burden on society (1). Although great progress has been made in current treatment of HF, its morbidity and mortality continue to rise (2). HF is estimated with a heritability of ∼26% (3). Previous genomewide association studies (GWAS) have identified a few genetic loci for HF (4), while its roles in etiology are unclear. As functional intermediates, circulating metabolites can reflect the underlying biological links of the individual genetic composition and the development of diseases. To date, metabolic dysfunction was proposed as an important contributor in HF (5), and metabolomic studies have identified a number of circulating metabolites associated with HF (6,7). However, the causal relationships between the metabolites and HF are unclear, and translating these metabolic findings into pathophysiological mechanisms and novel therapies is difficult. Hence, a comprehensive analysis is needed to uncover the interactions between genetics and circulating metabolites in the pathogenesis of HF.
The basic idea of Mendelian randomization (MR) is to use genetic variation as an instrumental variable (IV), which is strongly related to exposure factors and can infer the causal effects between exposure factors and research outcomes (8). To date, some MR studies have been performed in exploring the causation between exposure and heart failure, though the main focus was single exposure or routine exposure factors, such as brain natriuretic peptide (9), interleukin-6 (10), and heart rate (11). Few studies focused on the blood metabolites, especially based on the metabolome. A previous study conducted two-sample MR analysis on 486 blood metabolites and five major psychiatric disorders. It has successfully identified several disease-linked metabolites (12), providing novel insights into integrating metabolic mechanism with psychiatric disorders. However, no research about investigating the causal relationships between blood metabolites and the risk of HF has been reported. Hence, we used a two-sample MR approach for assessing the causal relationships between 486 human blood metabolites and risk of HF in this study to provide a deeper understanding of the pathogenesis of HF.

Study Design and Data Resources
The data we used in this study all came from the public dataset, which are publicly available on the database website, and has obtained ethics approval in the previous studies.
The study flow is illustrated in Figure 1. Exposure information was obtained from Sample 1, which is the largest mGWAS data published by Shin et al. (13) in 2014 and contains 7,824 Europeans. After strict quality control, ∼2.1 million Abbreviations: HF, heart failure; MR, Mendelian randomization; mGWAS, metabolome-based genome-wide association study; GWAS, genome-wide association study; IV, instrumental variable; SNP, single nucleotide polymorphism; LD, linkage disequilibrium; IVW, inverse variance weighted. single nucleotide polymorphisms (SNPs) and 486 blood metabolites (including 309 known metabolites and 177 unknown metabolites) were employed. These metabolites can be split into eight major categories: carbohydrates, amino acids, nucleotides, cofactors and vitamins, lipids, peptides, energy products, and xenobiotic metabolites. Summary data of all the mGWAS results in Sample 1 are publicly available on a database website (http:// metabolomics.helmholtz-muenchen.de/gwas/).
Outcome information was obtained from Sample 2, which is based on the results of a large-scale GWAS meta-analysis conducted by Shah et al. (4) in 2020 on 26 studies of HF. This dataset contains 47,309 cases and 930,014 controls of European lineage, and ∼8.3 million SNPs were employed in association analyses. The summary data of HF GWAS in Sample 2 were downloaded from the CVDKP Datasets website (http://www. kp4cd.org/datasets/mi).

Quality Control of IV
A series of unified selection standards was adopted for the genetic variation in 486 metabolites in this study. We used a relatively loose P-value threshold, which was widely used in MR analysis (8), that is, P < 1 × 10 −5 , as a significant condition for the preliminary selection of IVs. Then, we performed linkage disequilibrium (LD) analysis to achieve independent genetic instruments, which were derived from a stringent clumping criterion [LD cutoff of r 2 = 0.001 within a 10,000 kb window in the 1000 Human Genomes Project (14) European (EUR) reference panel]. Given that metabolites in similar metabolic pathways may be regulated by the same SNPs and multiple metabolites are significantly associated with the same IVs, so the MR hypothesis could be disturbed. Hence, we conducted the restricted selection of IVs (15) to exclude SNPs that were significantly related to more than two metabolites. Besides, we searched for keywords [(HF) OR (heart failure) AND (SNP) OR (GWAS)] in the PubMed, and we collected SNPs related to HF (including various types, e.g., dilated cardiomyopathy, incident systolic heart failure, advanced heart failure, congestive hearts failure) or its risk factors (such as interleukin-6, ejection fraction, heart rate, aortic root size, etc.) in the published literature. We deleted the disease-related SNPs (Supplementary Table 1) and the duplicate SNPs after sorting and merging. Finally, we used the unique SNP for subsequent analysis.

MR Analysis
The inverse variance weighted (IVW) model was used as the primary two-sample MR analysis model. IVW was proposed by Burgess et al. (16) and usually used in the MR studies of multiple IVs. This method can be employed on the premises that IVs satisfy the assumptions of relevance, independence, and exclusivity and genetic variation affect outcomes only through exposure in the study. The IVW method is ideal in estimating robust causal detection ability. We considered that the features of these metabolites and HF risk have a strong causal relationship if the P-value of IVW exceeds the multiple-testing adjusted threshold (P < 0.05/486 = 1.03 × 10 −4 ). However, given that the causal effects between blood metabolites and risk of HF sometimes are limited, a strict threshold might lead to the loss of some potential signals. Hence, we focused on potentially causal metabolites (P > 1.03 × 10 −4 but P < 0.05) and added four extra MR models to test the causal influence features, namely, MR-Egger regression (17), the weighted median method (18), the simple mode-based estimator (19), and the weighted mode-based estimator (19).
MR-Egger regression is the weighted linear regression of the effects of IVs and exposure and the effects of IVs and outcome (17). Different from the intercept term of IVW forced linear regression, the latter is zero, the intercept term in this model is a variable, and the horizontal pleiotropy of genetic variation can be measured by the intercept term. The fact that the intercept term does not correspond to zero indicates pleiotropy, but MR-Egger can still get an unbiased estimation when the IVs exist pleiotropy, which is its advantage. When applying the MR-Egger model, the tool variables will have nothing to do with the outcome, and only through exposure factors affect the outcome, which weakens the exclusive hypothesis of the IVM method to the tool variables. MR-Egger just needs to meet the hypothesis named "InSIDE (instrument strength independent of direct effect) assumption" that the precise effects of tool variables and outcomes are independent of the correlation between tool variables and exposure factors. It is to be noted that the direction of all tool variables is the same in the analysis. Although the assumption of IVs can be effectively evaluated through the intercept item of MR Egger, it is less effective than IVW approach in detecting the causality (20).
The weighted median method is generally employed in measuring an effect, and the ratios of selected SNPs are calculated for the estimation of a weighted empirical distribution function (18). This method allows a strong SNP to provide an asymptotically consistent estimate of causal effects; even when an effective SNP is less, it can also reduce the bias of causal effects estimation.
The simple mode-based estimator classifies SNPs according to causal effects, and similar values are divided into a cluster. The estimated causal effect is estimated by a cluster with the largest number of SNPs. The weighted mode-based estimator weighs the causal effects values of each SNP pair to the number of SNPs in each cluster, and the results returned are temporary estimates with the maximum number of SNPs weight. The premise of using Mode-based Estimate method to access the consistent estimation of causal effect is to satisfy the "ZEMPA hypothesis" (Zero Modal Pleiotropy Assumption), that is, in the total genetic variation, the mode of the bias term is 0 (19). In brief, if the five MR models mentioned above produce similar estimates of causal effects and show significant P-values (P < 0.05) in at least three models (including IVW), then we consider the metabolite as a candidate causal feature for HF risk.

Sensitivity Analysis
Owing to the diversity of experimental conditions, analytical platforms, and study subjects, there may be heterogeneity in the two-sample MR analyses, resulting in bias in the estimation of causal effects. Thus, heterogeneity testing of IVW analysis and MR-Egger regression was adopted in this study. If the P > 0.05 in the test, evidence of heterogeneity in the included IVs is nonexistent, that is, the influence of heterogeneity on the estimation of causal effects can be ignored.
When we use IVW to explore the causal relationship, there may be other unknown confounding factors against genetic multiplicity and bias estimation of causal effects. Hence, we performed horizontal pleiotropy test by judging the intercept of MR-Egger regression and evaluating the P-value of it on the MR-Egger model. If the intercept is close to 0 (<0.1) and P > 0.05, we considered that there is no evidence for the existence of horizontal pleiotropy in the tests. In addition, we adopted MR-PRESSO method to further test horizonal pleiotropy and possible outliers by using MR-PRESSO package (21).
After implementing the heterogeneity test and horizontal pleiotropy test, we used the leave-one-out method in conducting sensitivity analysis on qualified metabolites. In this method, related SNPs are removed one by one, and the amalgamation effect of the remaining SNPs is calculated for the evaluation of the effect of each SNP on the metabolites. If the overall error line does not change considerably after the exclusion of each SNP (i.e., all error lines do not pass through 0), the result is considered reliable.

Pathway and Enrichment Analysis
We performed pathway and enrichment analysis of 11 HFrelated known metabolites (P < 0.05, IVW method) through the online metabolomics data analysis website [(22)

IV Information
A total of 39,142 SNPs were significantly associated with the 486 metabolites (P < 1 × 10 −5 ) in Sample 1. After LD analyses, the number of these SNPs collapsed into 9,485, and the SNPs were relatively independent from each other. Among the 9,485 SNPs, 335 were associated with at least two metabolites, and no SNP was associated with HF or its risk factors (see Methods). We excluded confounding SNPs and compared them with the SNPs in Sample 2 (Figure 1). Finally, 8,656 (94.6%) SNPs were selected for subsequent analyses. Five metabolites with IV number of less than three or more than 100 were removed in the subsequent MR analyses for stable and reliable statistical results.

MR Analysis Results
In this study, IVW model was used as the primary method in estimating the causal relationships between the blood metabolites and HF risk. Theoretically, the multiple-testing adjusted threshold (P < 1.03 × 10 −4 ) was used in assessing significance, and no metabolite exceeded the strict threshold in this study (Supplementary Table 2). A total of 22 metabolites comprising 11 known metabolites and 11 unknown metabolites showed nominally significant relation (P > 1.03 × 10 −4 but P < 0.05, IVW method) to HF ( Table 1). In the results of the pathway analysis of the 11 known metabolites, we found that the "Valine, leucine, and isoleucine biosynthesis" metabolic pathway that involves L-Isoleucine was significant (p = 0.026). L-isoleucine is an essential amino acid and must be supplemented in the diet. A study (28) had shown that the concentration of essential amino acids (including L-isoleucine) in the serum of chronic heart failure patients was significantly lower than that of the control group, suggesting that L-isoleucine may be associated with HF progression. As for the enrichment analysis, however, we did not identify significant (p < 0.05) metabolite sets (Supplementary Tables 3-5 and Supplementary Figures 1-3).  ) to estimate the causal effects between the 11 potentially HF-related metabolites and HF risk ( Table 2). Two metabolites were significant in at least three MR models and showed consistent causal effects in all models ( Table 2 and Figure 2), namely, 1-arachidonoylglycerophosphoethanolamine (P IVW = 3.05 × 10 −2 , P MR Egger = 2.48 × 10 −2 , P Weighted median = 1.4 × 10 −2 , P Simple mode = 1.2 × 10 −1 , P Weighted mode = 6.72 × 10 −2 ) and 4-vinylphenol sulfate (P IVW = 2.16 × 10 −3 , P MR Egger = 1.69 × 10 −1 , P Weighted median = 3.34 × 10 −3 , P Simple mode = 8.7 × 10 −2 , P Weighted mode = 2.37 × 10 −2 ). For the 4-vinylphenol sulfate, the overall results were similar for the five methods/models. The point estimate from MR-Egger regression was similar to this from IVW, and the interval estimates were relatively wide (Figure 2A). We noted that there may be an outlier here, while the funnel plot (Supplementary Figure 4) showed that the number of points was almost symmetrically distributed when using individual SNPs as IVs (6 vs. 4). But the corresponding causal effect values were less evenly distributed in the IVW and MR-Egger regression models, suggesting that the results obtained using these 10 SNPs as IVs may still be subject to potential bias.

Evaluation of the Reliability and Stability of the Results
We performed heterogeneity and horizontal pleiotropy tests on the 11 known metabolites (P < 0.05, IVW method) to evaluate the reliability and stability of the results. The P-values of the test results (including MR-Egger and MR-PRESSO methods) were more than 0.05 and the intercept of MR-Egger regression is close to 0 (<0.1), suggesting evidence of the existence of heterogeneity and horizontal pleiotropy in these metabolites is non-existent ( Table 2 and Supplementary Table 6). As for the two relatively robust metabolites (significant in at least three MR models, 1-arachidonoylglycerophosphoethanolamine, and 4-vinylphenol sulfate), we performed sensitivity analyses FIGURE 3 | Graphical summary. Among 486 human blood metabolites, this study found that a 1-SD increase in the xenobiotics 4-vinylphenol sulfate was associated with ∼22% higher risk of HF by using two-sample MR approach. by using a leave-one-out approach to test the stability. All IVs (SNPs) of 4-vinylphenol sulfate showed no sensitivity to the results, suggesting a strong link between exposure and outcome, whereas the four IVs (rs1984049, rs17031728, rs11081670, and rs39741) of 1-arachidonoylglycerophosphoethanolamine may have significantly affected the result (Figures 2C,D). After removing the four sensitive SNPs, we performed MR analyses again using the five models, and we found that the results were no longer significant ( Figure 2E).

DISCUSSION
In this study, we performed unbiased two-sample MR analysis to perform causal evaluation on 486 blood metabolites and HF risk. We collected the largest mGWAS and large HF GWAS summary data from public databases. We used genetic variants as IVs and discovered 11 known metabolites, which were considered potential risk predictors of HF after primary IVW analysis. Moreover, to further ensure the reliability and stability of the results, another four MR models and sensitivity analysis were performed. The result consistently supported that the xenobiotic 4-vinylphenol sulfate is related to increased HF risk (see Figure 3).
As a sulfate conjugate, 4-vinylphenol sulfate is one of the main metabolisms of 4-vinylphenol in vivo (29). Naturally found in crops, such as peanut and wild rice (30), 4-vinylphenol is an essential ingredient widely used in meat and seafood flavor formulations (PubChem CID: 62453). Our findings showed that 4-vinylphenol sulfate could increase the incidence of HF by 22% (IVW method), suggesting that long-term or excessive diets containing such compound or 4-vinylphenol, especially in the additives, may increase the likelihood of HF. Previous studies have shown that the level of 4-vinylphenol sulfate in the blood is closely related to smoking (31), which is a key risk factor for myocardial systolic dysfunction and hospitalization due to mental failure (32). Petersen et al. (33) showed a significant correlation between 4-vinylphenol sulfate and methylation at a certain site of RARA, which is a transcription factor that regulates differentiation and apoptosis (34), and the evidence may be linked to the pathogenesis of HF.
Another metabolite is worth mentioning, namely, 1arachidonoylglycerophosphoethanolamine, which can be referred to as LysoPE [20:4 (5Z,8Z,11Z,14Z)/0:0] or LPE (20:4/0:0). After analysis by the primary IVW method, it is found that the metabolite was related to decreased risk of HF and showed a significant (P < 0.05) causation with HF in another two MR models. However, it did not pass the final leave-one-out analysis. LysoPE [20:4 (5Z,8Z,11Z,14Z)/0:0] is an endogenous compound and a kind of lysolipid. Gao et al. (35) found that LysoPE 20:4 is significantly related to Qi deficiency syndrome in the treatment of congestive HF with traditional Chinese medicine, suggesting that it may be one of the specific metabolic biomarkers of congestive HF treated using traditional Chinese medicine granules. In addition, HF is associated with significant disturbances in phospholipid metabolism. A statistically significant decline in LysoPE level was found in patients with chronic HF with reduced ejection fraction (36). Supplementation with LysoPE in mammalian cells can reverse mitochondrial impairments (37). Our findings suggested that LysoPE 20:4 has a potential positive influence on HF risk, providing an interesting and valuable evidence for future studies.

Innovations and Limitations
Our study has some innovations. First, from the perspective of molecular mechanism, regarding blood metabolites as exposure factors in exploring the causal relationships between metabolites and HF risk has a solid theoretical basis and important clinical research value. Second, the study used strict quality control conditions and reasonable analysis methods, including a variety of models, to evaluate the causal effects. Thus, the results of this study are reliable and stable. Third, unlike in previous MR analyses of single exposure factors, analysis of a large number of blood metabolites may require huge workloads and present analytical challenges. The analysis strategy we presented might provide a reference for similar studies. Our study may have some limitations. To begin with, all the mGWAS and HF GWAS data were obtained from the European population, and thus comprehensive studies involving different ethnic groups are needed. Furthermore, half of the risk predictors of HF obtained by preliminary analysis (IVW only) are unknown metabolites, and their functional structures are unclear. Thus, the findings in the study are limited. Finally, although we revealed that 4-vinylphenol sulfate is nominal causal related to heart failure by using an unbiased two-sample MR approach, while this relationship was theoretical and we failed to confirm it mechanistically. Hence, further work is still needed to uncover the role of 4-vinylphenol sulfate in the pathogenesis of HF, therefore confirming this causal relationship.

CONCLUSIONS
In conclusion, we used a two-sample MR approach to explore the causal relationships between 486 blood metabolites and HF among more than 0.9 million Europeans. We found that 1-SD increase in the xenobiotic 4-vinylphenol sulfate could nominally increase the risk of HF by 22%. Our findings strengthen our knowledge of the relationships between blood metabolites and HF, which potentially facilitate the establishment of personalized explanation or markers for biological differences in disease status.

AUTHOR CONTRIBUTIONS
ZW designed the study, performed data analysis, and drafted the manuscript. SC performed data analysis, drafted, and revised the manuscript. QZ revised the manuscript. YW, GX, and GG collected the data. WL and JC provided the resources. SZ designed the study, leaded the study, and revised the manuscript.