Mendelian Randomization Identifies CpG Methylation Sites With Mediation Effects for Genetic Influences on BMD in Peripheral Blood Monocytes

Osteoporosis is mainly characterized by low bone mineral density (BMD) and is an increasingly serious public health concern. DNA methylation is a major epigenetic mechanism that may contribute to the variation in BMD and may mediate the effects of genetic and environmental factors of osteoporosis. In this study, we performed an epigenome-wide DNA methylation analysis in peripheral blood monocytes of 118 Caucasian women with extreme BMD values. Further, we developed and implemented a novel analytical framework that integrates Mendelian randomization with genetic fine mapping and colocalization to evaluate the causal relationships between DNA methylation and BMD phenotype. We identified 2,188 differentially methylated CpGs (DMCs) between the low and high BMD groups and distinguished 30 DMCs that may mediate the genetic effects on BMD. The causal relationship was further confirmed by eliminating the possibility of horizontal pleiotropy, linkage effect and reverse causality. The fine-mapping analysis determined 25 causal variants that are most likely to affect the methylation levels at these mediator DMCs. The majority of the causal methylation quantitative loci and DMCs reside within cell type-specific histone mark peaks, enhancers, promoters, promoter flanking regions and CTCF binding sites, supporting the regulatory potentials of these loci. The established causal pathways from genetic variant to BMD phenotype mediated by DNA methylation provide a gene list to aid in designing future functional studies and lead to a better understanding of the genetic and epigenetic mechanisms underlying the variation of BMD.


Part 1. Power analysis of using logistic regression to detect DMC
We performed simulation analysis to calculate the power of detecting methylation difference great than 0.05 using logistic regression. We considered the following two different scenarios: In scenario 1, the ground truth is that the difference in methylation statistically caused solely by BMD group is 0.05. To make it concise, we use a model of simple logistic regression, in which the only predictor variable of DNA methylation value at a CpG site is the BMD group. However, this scenario could also represent the case that one or more covariates have been adjusted.
In scenario 2, the total effect of BMD and other covariates on the mean difference in methylation is set to be 0.05. We used a model of multiple logistic regression, which contains five predictor variables (age, BMI, drinking status, smoking status, and 1st PC of methylation), same as what we used in the real differential methylation analysis.
At a CpG locus, the methylation data of subject group is simulated by equation (1)(2), where is the methylation status of subject group sequence ; = 1,2 … ; 0 = 54; 1 = 64; = 0,1; = 1, … , . In both scenarios, we control two parameters that affect the power, the coverage of the CpG site in each subject ( ) and the probability of methylation in the low BMD group ( 0 ). The group difference is set to be 0.05, i.e. 1 = 0 + 0.05.
The logistic regression in equation (3) was then fitted for the simulated data. The covariate part was omitted in scenario 1. In scenario 2, five covariates are simulated according to the groupspecific parameters in Table 1. For each parameter setting, the regression was replicated 1000 times. The empirical power is calculated as the number of times the null hypothesis ( 0 : = 0) is rejected over the number of replications. It is important to keep in mind that the sample size of (3) is the number of instead of , thus the sample size is ∑ . This fact has also been pointed out in the original paper of methylKit package.
(3) Figure S5-a shows the power versus 0 (varies from 0.1 to 0.9) at a mean coverage of 30 in the two scenarios. In scenario 1, the power of detecting 0.05 difference caused by BMD group is constantly greater than 80% regardless of the change of 0 . The power curve makes a U-shape with lower power at 0 =0.5 and higher power at 0 =0.1 and 0.9. In scenario 2, the power is very close to 80% for detecting the methylation difference in the BMD group adjusting for other covariates. There's a U-shape in the power curve but not as apparent as the simple logistic regression model (scenario 1).    Figure S6. Network of enriched terms. Each node represents an enriched term and is colored by its cluster name. Table S13. Top six clusters with their representative enriched GO terms. "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology.