Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study

Introduction: High body mass index (BMI) is a positive associated phenotype of type 2 diabetes mellitus (T2DM). Abundant studies have observed this from a clinical perspective. Since the rapid increase in a large number of genetic variants from the genome-wide association studies (GWAS), common SNPs of BMI and T2DM were identified as the genetic basis for understanding their associations. Currently, their causality is beginning to blur. Materials and Methods: To classify it, a Mendelian randomisation (MR), using genetic instrumental variables (IVs) to explore the causality of intermediate phenotype and disease, was utilized here to test the effect of BMI on the risk of T2DM. In this article, MR was carried out on GWAS data using 52 independent BMI SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated using inverse-variance weighted method for the assessment of 5 kg/m2 higher BMI on the risk of T2DM. The leave-one-out validation was conducted to identify the effect of individual SNPs. MR-Egger regression was utilized to detect potential pleiotropic bias of variants. Results: We obtained the high OR (1.470; 95% CI 1.170 to 1.847; P = 0.001), low intercept (0.004, P = 0.661), and small fluctuation of ORs {from -0.039 [(1.412 – 1.470) / 1.470)] to 0.075 [(1.568– 1.470) / 1.470)] in leave-one-out validation. Conclusion: We validate the causal effect of high BMI on the risk of T2DM. The low intercept shows no pleiotropic bias of IVs. The small alterations of ORs activated by removing individual SNPs showed no single SNP drives our estimate.


INTRODUCTION
Diabetes mellitus (DM) is characterized by a bunch of chronic metabolic diseases leading to insulinsecretion deficiency (Olokoba et al., 2012;Pan et al., 2013;Shi and Hu, 2014). High blood sugar levels in DM patients over a prolonged period impair body tissues, such as eye, kidney, heart, and so on. Currently, more than 400 million people suffer from diabetes worldwide, of which type 2 DM (T2DM) makes up about 90% (Olokoba et al., 2012;Pan et al., 2013;Shi and Hu, 2014). Most patients who suffer from T2DM are over the age of 40 (Olokoba et al., 2012;Pan et al., 2013;Shi and Hu, 2014). In theory, people have a long time to prevent T2DM under the right direction. To this end, researchers go out of their way to investigate the causes of T2DM. Observational studies exposed that body mass index (BMI) was strongly associated with the risk of being diagnosed with T2DM (Sanada et al., 2012;Ganz et al., 2014;Chen et al., 2015Chen et al., , 2016Zhao et al., 2017). In Sanada et al. (2012) conducted a 10-year retrospective cohort study on 969 men and 585 women (Sanada et al., 2012). They observed high BMI was an independent and dose-dependent risk factor for T2DM in Japanese patients (Sanada et al., 2012). In Ganz et al. (2014) directed a case-control study to assess the association between BMI and the risk of T2DM in the United States (Ganz et al., 2014). A positive association between them was found in 12,179 cases (> = 18 years old) and 25,177 controls (Ganz et al., 2014). The analogous studies without considering genetic factors almost came to a consistent conclusion.
After identifying a large number of BMI-associated and T2DM-associated loci in genome-wide association studies (GWAS), their common associated variants were then interpreted as the underlying cause of BMI and the risk of T2DM. In 2007, the first common variant in the FTO gene of BMI and T2DM was reported in European descents (Frayling et al., 2007). Subsequently, corresponding investigations sprung up for validating the existing common locus and identifying their novel common variants of BMI and T2DM (Andreasen et al., 2008;Herder et al., 2008;Cauchi et al., 2009;Legry et al., 2009;Webster et al., 2010;Song et al., 2012;Xi et al., 2014). In 2014, a meta-analysis of 42 studies for BMI and T2DM associated variants was conducted (Xi et al., 2014). Eventually, 4 statistically significant associated variants (FTO rs9939609, SH2B1 rs7498665, FAIM2 rs7138803, GNPDA2 rs10938397) were identified for both in Europeans.
Whether a higher BMI increases the risk of T2DM or T2DM affects BMI or their common genetic factors take effect, is still unknown according to current observations. In addition, after considering confounding factors, the causal relationship between BMI and T2DM may be reverse. To estimate the causal effect of BMI on the risk of T2DM, we conducted this Mendelian randomization (MR) study, which is an instrumental variable (IV) based method to infer causality of exposure and disease in observation studies. Genetic variants that are associated with intermediate phenotypic exposures are introduced as IVs by MR to estimate the effect of phenotypic exposures on a disease outcome ( Figure 1A). Due to random distribution of gene variants during gametogenesis, IV-based analysis can avoid reverse causality. The basic principle of estimating the influence of BMI on the risk of T2DM using MR is shown in the Figure 1B, where Z (e.g., variants) represents IV, X indicates exposure BMI, and Y is disease T2DM. Two assumptions should be suitable for the case before using MR.
1 The variants are robustly associated with BMI. 2 The variants are independent of the T2DM without considering BMI and confounders. It means the only way to influence the T2DM by the variants is via an intermediate.
The two assumptions mean the variants should be associated with BMI but not with T2DM. Therefore, the conclusions based on MR could not result from the common genetic factors of BMI and T2DM.

MATERIALS AND METHODS
Two summary-level data of GWAS datasets were utilized by MR analysis. One of them was for extracting significant BMI SNP sets to meet the assumption 1. And the other was for extracting no significant T2DM SNP sets to meet assumption 2. The intersections of these two SNP sets were then analyzed using MR.

Summary-Level Data for Associations Between Genetic Variants and BMI
In Locke et al. (2015) conducted a meta-analysis of BMI using GWAS on Metabochip studies . Totally, 322,154 individuals of European descents and 17,072 individuals of non-European descent were analyzed. As a result, 97 BMIassociated SNPs (P < 5 × 10 −8 ) were identified for European. The corresponding SNPs, effect allele (EA), allele frequencies, beta coefficients, and standard errors (SEs) were extracted from Genetic Investigation of Anthropometric Traits (GIANT) consortium (Locke et al., 2015) as summary-level data for associations between genetic variants and BMI.

Summary-Level Data for Associations
Between Genetic Variants and T2DM Morris et al. (2012) carried out a combined meta-analysis of European descents on two GWAS data sets (Yang et al., 2010;Lee et al., 2011), which involved 22,669 cases and 58,119 controls. All the variants were then genotyped with Metabochip involving 1,178 cases and 2,472 controls of Pakistani descent. The analytical result contains novel susceptibility locus together with other SNPs, SEs and their P-values on the risk of T2DM. These were utilized as summary-level data for associations between genetic variants and T2DM.

Data Processing and Analysis
Two summary-level datasets were processed into assumptionoriented data (Figure 2). According to assumption 2, genetic pleiotropy can result in over-precise estimates in subsequent analysis. According to the application principles of Mendelian randomization analysis, the study is based on Mendel's second law of inheritance: the separation and combination of genetic gametes controlling different traits do not interfere with each other; in the formation of gametes, the paired genetic gametes that determine the same trait are separated from each other, and the genetic gametes that determine different traits are freely combined. When the two genes are not completely independent, they will show a certain degree of linkage, a situation called linkage disequilibrium (LD), which will greatly affect the exclusiveness of the variable tool to phenotypic inheritance, leading the subsequent calculations bias generally called "over-precise estimates." To avoid this situation, these  loci with potential LD were removed from 97 BMI-associated SNPs, which was done by Noyce et al. (2017) in the previous study. The 97 SNPs were first ranked from the smallest to largest P-values. Then for the top ranked SNPs, Noyce et al. (2017) removed those in LD (R 2 threshold of 0.001) or those within 10,000 kb physical distance based on a reference dataset (Devuyst, 2015) from the 97 SNPs. This process was iterated for the remaining SNPs. As a result, 78 BMI-associated SNPs (P < 5 × 10 −8 ) without potential LD of each other were obtained. According Xi et al. (2014), meta-analysis, four SNPs (rs9939609, rs7498665, rs7138803, rs10938397) were found at the T2DM-associated locus, and were also further removed from these 78 SNPs. In addition, those SNPs with P-value less than 0.05 by Morris et al. (2012) were removed as well. Finally, 52 SNPs that confirmed to the two MR assumptions were retained for MR analysis.
Three subjects involving the influence of BMI on the risk of T2DM (Figure 2), the sensitivity of the disproportionate effects of variants, and the detection of bias due to pleiotropy were investigated in MR analysis. These issues were analyzed by MR method, leave-one-out validation, and MR-Egger regression (Bowden et al., 2015), respectively.
• MR method MR method was described in the previous study (Bowden et al., 2015) and summarized for evaluating the influence of BMI on the risk of T2DM as below. Assuming X, Y, and Z are BMI, T2DM, and variants, respectively, Wald ratio (β XY ) of BMI to T2DM through specified variant is calculated as follows: where β ZY represents the per-allele log(OR) of T2DM from summary-level data of Morris et al. (2012) study. β ZX is the perallele log(OR) of BMI from summary-level data of Locke et al. (2015) study. SE of BMI-T2DM association of each Wald ratio is defined as follows: where SE ZY and SE ZX represent the SE of the variant-T2DM and variant-BMI associations from corresponding summarylevel data, respectively. Subsequently, 95% confidence intervals (CIs) were then calculated from the SE of each Wald ratio. These summarized data were then estimated using inversevariance weighted (IVW) linear regression for meta-analysis. The meta-analysis model for the point estimate is according to the heterogeneity of the summarized data. Fixed effect model is used for no significant heterogeneity, and random-effect model is used for others.
To evaluate the genetic heterogeneity of summarized data, Cochran's Q-test and statistic I 2 were utilized here. Cochran's Q-test follows a χ 2 distribution with k−1 degrees of freedom, where k represents the number of variants for analysis. I 2 = (Q−(k−1))/Q × 100% ranges from 0 to 100%. P < 0.01 and I 2 > 50% were defined as the significant heterogeneity here .
• Leave-one-out validation To test the sensitivity of variants, we designed a leave-one-out validation measure. In brief, to test the influence of an SNP to the conclusion, the SNP was removed from the 52 SNPs to carry out IVW point estimate. The fluctuation of the results before and after removing the SNP reflects the sensitivity of this SNP. Here this process was iterated for each of these 52 SNPs. • MR-Egger test To ensure that violations of our analysis were not biasing the estimate of the directional causal association, MR-Egger regression asymmetry test was used (Bowden et al., 2015). The MR-Egger regression is adapted from Egger regression, which is a tool to detect small study bias in meta-analysis and test for bias from pleiotropy. The estimated value of the intercept in MR-Egger regression can be interpreted as an estimate of the average pleiotropic effect across the genetic variants. An intercept that differs from zero is indicative of overall directional pleiotropy. The slope coefficient from MR-Egger regression provides a bias estimate of the causal effect.
All statistical tests for MR analysis were undertaken using the R Package of meta-analysis 1 and Mendelian Randomization (Yavorska and Burgess, 2017).

RESULTS
Among the 97 BMI-associated SNPs (Locke et al., 2015), 19 SNPs with LDs, 2 T2DM-associated SNPs (rs7138803, rs10938397) from Xi et al. (2014) study, 20 T2DM-associated SNPs and 1 unmapped SNPs from Morris et al. (2012) study, and 3 uncertain SNPs were removed (Supplementary Table 1). 52 BMI-associated SNPs were eventually selected for the MR analysis in Table 1. Each line of the table documents 12 items involving the SNP, EA and its frequencies, beta coefficients of the SNP on the risk of BMI and T2DM, and SEs.

The Influence of BMI on the Risk of T2DM
The pooled results using IVW method from 52 individual SNPs showed that high BMI significantly increases the risk of T2DM. Due to the lack of evidence of heterogeneity between variants of the summarized data (P = 0.499 and I 2 = 0%; Figure 3), the fixed-effect model was utilized here for meta-analysis. The OR of T2DM per 5kg/m 2 higher BMI was 1.470 (95% CI 1.170 to 1.847; P = 0.001). In addition, we analyzed the effect of BMI on the risk of T2DM by six other methods involving Simple median, Weighted median, Penalized weighted median, Penalized IVW, Robust IVW, and Penalized robust IVW methods (Zhao et al., 2017). The results were shown in Table 2, which are consistent with the result based on IVW method.

Sensitivity Analysis
ORs from leave-one-out analysis were shown in  Pleiotropic Effect Analysis Figure 5 shows the symmetrical inverted funnel of the point estimate from individual variants. The effect estimated from MR-Egger regression was 1.24 (95% CI 0.553 to 1.928; P = 0.493), with an intercept of 0.004 (95% CI −0.013 to 0.020; P = 0.661; Figure 6). Together these findings provided evidence against the possibility that horizontal pleiotropic effects tend to be bias IVW estimates.

DISCUSSION
In this study, we exposed the causal effect of BMI on the risk of T2DM using MR method. Here, two-summary level data involving association between genetic variants and BMI from Locke et al. (2015) study and association between genetic variants and T2DM from Morris et al. (2012) study were utilized for this purpose. According to the previous investigation, the MR was viewed as a meta-analysis of multiple genetic variants (Bowden et al., 2015;Nordestgaard et al., 2017;Noyce et al., 2017;Wei et al., 2017). Since there was very low heterogeneity between variants of the summarized data (P = 0.499 and I 2 = 0%) (Figure 3), the fixed-effect model was utilized for meta-analysis. The pooled results of point estimates using IVW method indicate that the OR of T2DM per 5 kg/m 2 higher BMI was 1.470 (95% CI 1.170 to 1.847; P = 0.001). This evidence suggested that high BMI increases the risk of T2DM. Sensitivity analysis and bias analysis were then carried out for genetic variants. To test whether the results are influenced by individual SNPs, we conducted the leave-one-out validation. Results in Figure 4 indicate very small fluctuations after the removal of individual SNPs. The statistical evidence of MR-Egger regression (P = 0.493) with a very low intercept (0.004; Figure 6) indicates no significant bias of our data and no pleiotropic effect of the genetic variants, respectively.
The inference that the causal effect of BMI on the risk of T2DM from this study is valuable for both investigations and clinical practice. Although abundant observational studies identified the association between BMI and T2DM, a causal effect cannot be ascertained from these investigations. Especially when their common SNPs were identified in recent studies, these genetic variants were then deemed as the primary cause of the BMI-T2DM association by some of the researchers. In brief, current studies cannot help to understand how BMI is associated  with T2DM. The observation of this causal effect suggested that helping to decline BMI could be used as a potential method when developing T2DM prevention strategies. Excessive BMI means that the body is overweight or in most cases obese, and this is most likely as the real initial cause of T2DM. Obesity has become a pandemic disease worldwide, which has resulted in a significant increase in the incidence of diabetes, non-alcoholic fatty liver disease and coronary heart disease (Milic et al., 2014;Rao et al., 2015;Zhou et al., 2017). In obesity, the hypertrophy, hypoxia of fat cells, endoplasmic reticulum stress, lipids toxicity and many other factors can lead to adipocytokines dysfunction, increased vascular permeability, along with promoting immune cell infiltration into fat tissue, release of more inflammatory factors, and formation of a vicious circle of inflammatory reactions, leading to the persistence of chronic inflammatory states. It is now widely believed that inflammation plays a key mediator role in the development of type 2 diabetes (Ramalho and Guimaraes, 2008;Engin, 2017). Therefore, strengthening exercise, maintaining a reasonable diet and good fitness are still the iron we must adhere to. Our study benefits from both the GWAS data and MR method. Clinical statistics using typical methods exposed large number of the associations between diseases and phenotypic exposures. With the rapid increase in the identifications about the genetic basis of diseases and phenotypic exposures, using genetic variants for precise estimates of the causal effect of phenotype on disease by MR method, attracts more and more attention (Benn et al., 2017;Richmond et al., 2017;Rodriguez-Broadbent et al., 2017;Went et al., 2017). For example, Noyce et al. (2017) utilized the MR method for assessing the causal influence of BMI on the risk of Parkinson disease (PD). Nordestgaard et al. (2017) estimated the effect of BMI on Alzheimer's disease (AD). On account of multiple genetic variants of phenotypes, Bowden et al. (2015) proposed a strategy to view MR with multiple instruments as a meta-analysis and an MR-Egger method for analyzing bias caused by pleiotropy, which was widely used in current studies. Considering the fuzzy relation between BMI and T2DM, we conducted this MR analysis to specify their relationship.
The two assumptions were described in the "Introduction" section for our MR study. Following assumption 1, 97 BMIassociated SNPs were extracted from summary-level data of Locke et al. (2015) study. After removing SNPs with LD and T2DM-associated SNPs, 52 SNPs conforming to the assumption 2 were assigned for further analysis. In addition, MR requires that the genetic variants are independent of any known confounding variables. During to the lack of information about potential confounding factors of BMI and T2DM, no confounders were considered in this study. Therefore, our observation may be limited by this weakness. Link prediction Zhang et al., 2017;Peng et al., 2018a) and artificial intelligence methods (Cabarle et al., 2017;Peng et al., 2018b;Wei et al., 2017Wei et al., , 2018b may be used to solve this problem, which has been successfully applied in the prediction of disease genes (Peng et al., 2017;Zeng et al., 2017), miRNAs (Zeng et al., , 2018Zou et al., 2016), RNA methylation (Wei et al., 2018a), and drug-induced hepatotoxicity (Su et al., 2018).
In summary, the MR analysis in this article verified that high BMI can increase the risk of T2DM. It helps us to understand the pathogenic factor of T2DM. It also may help to enhance the molecular and phenotypic annotations of T2DM and human diseases (Cheng et al., 2016(Cheng et al., , 2018c, which could be further applied in analyzing diseases in a system biology perspective (Cheng et al., 2018a,b;Hu et al., 2018).