ORIGINAL RESEARCH article
Sec. Statistical Genetics and Methodology
Volume 11 - 2020 | https://doi.org/10.3389/fgene.2020.610852
Use of Multivariable Mendelian Randomization to Address Biases Due to Competing Risk Before Recruitment
- 1Graduate School of Public Health and Health Policy, City University of New York, New York, NY, United States
- 2School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
- 3Department of Epidemiology and Biostatistics, School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom
- 4Singapore Institute for Clinical Sciences (SICS), The Agency for Science, Technology and Research (A∗STAR), Singapore, Singapore
Background: Mendelian randomization (MR) provides unconfounded estimates. MR is open to selection bias when the underlying sample is selected on surviving to recruitment on the genetically instrumented exposure and competing risk of the outcome. Few methods to address this bias exist.
Methods: We show that this selection bias can sometimes be addressed by adjusting for common causes of survival and outcome. We use multivariable MR to obtain a corrected MR estimate for statins on stroke. Statins affect survival, and stroke typically occurs later in life than ischemic heart disease (IHD), making estimates for stroke open to bias from competing risk.
Results: In univariable MR in the UK Biobank, genetically instrumented statins did not protect against stroke [odds ratio (OR) 1.33, 95% confidence interval (CI) 0.80–2.20] but did in multivariable MR (OR 0.81, 95% CI 0.68–0.98) adjusted for major causes of survival and stroke [blood pressure, body mass index (BMI), and smoking initiation] with a multivariable Q-statistic indicating absence of selection bias. However, the MR estimate for statins on stroke using MEGASTROKE remained positive and the Q statistic indicated pleiotropy.
Conclusion: MR studies of harmful exposures on late-onset diseases with shared etiology need to be conceptualized within a mechanistic understanding so as to identify any potential bias due to survival to recruitment on both genetically instrumented exposure and competing risk of the outcome, which may then be investigated using multivariable MR or estimated analytically and results interpreted accordingly.
Mendelian randomization (MR), i.e., instrumental variable analysis with genetic instruments, is an increasingly popular and influential analytic technique (Davies et al., 2018; Taubes, 2018), which can be used to investigate causal effects even when no study including both exposure and outcome of interest exists. Invaluably, MR studies have provided estimates more consistent with results from randomized controlled trials (RCTs) than conventional observational studies, even foreshadowing the results of major trials (Holmes et al., 2017). MR studies are often presented as observational studies analogous to RCTs (Davey Smith and Ebrahim, 2005; Burgess et al., 2012) because they take advantage of the random assortment of genetic material at conception, while observational studies are open to biases from confounding and selection bias (Bareinboim and Pearl, 2016). Instrumental variable analysis is described in health research as addressing confounding (Greenland, 2000; Maciejewski and Brookhart, 2019), i.e., bias from common causes of exposure and outcome (Bareinboim and Pearl, 2016). MR is currently described as “less likely to be affected by confounding or reverse causation than conventional observational studies” (Davies et al., 2018).
Mendelian randomization was originally thought to be less open to selection bias than conventional observation studies (Smith and Ebrahim, 2004). Selection bias is now increasingly widely recognized as a limitation of MR (Nitsch et al., 2006; Boef et al., 2015; Canan et al., 2017; Munafo et al., 2017; Swanson et al., 2017; Gkatzionis and Burgess, 2018; Munafo and Smith, 2018; Vansteelandt et al., 2018; Hughes et al., 2019; Swanson, 2019), which may violate the instrumental variable assumptions. Sources of potential selection bias in MR have been specifically identified as selecting an unrepresentative sample (Munafo et al., 2017; Munafo and Smith, 2018; Hughes et al., 2019), attrition from an initially representative sample, such as a birth cohort (Munafo et al., 2017), and selecting a sample strongly on surviving the exposure (Gkatzionis and Burgess, 2018) or genotype of interest (Vansteelandt et al., 2018; Smit et al., 2019). What has not explicitly been considered is selecting the underlying sample(s) on surviving the genotype of interest in the presence of competing risk of the outcome. MR studies are particularly vulnerable to sample selection on survival because of the time lag between genetic randomization (at conception) and typical recruitment into genetic studies of major diseases in middle to old age. MR studies also often concern major causes of death thought to share considerable etiology. For example, lipids, blood pressure, diabetes, lifestyle (such as smoking, diet, physical activity, and sleep), and socioeconomic position cause both ischemic heart disease (IHD) and ischemic stroke, with death from IHD typically occurring at younger ages than death from stroke (Kesteloot and Decramer, 2008; Menotti et al., 2019). As a result, a study of the association of lipid modifiers with stroke among the living will automatically select on surviving high lipids and on surviving competing risk of prior death from IHD due to shared etiology between IHD and stroke. Some people dying from genetically high lipids and others dying from IHD before recruitment into a stroke study will leave a shortage of people available to recruit with genetically high lipids and susceptibility to stroke, thereby obscuring any effect of lipids or lipid modifiers on stroke. Correspondingly, MR studies suggest less effect of lipids and lipid modifiers on stroke than IHD (Hopewell et al., 2018; Valdes-Marquez et al., 2019), although RCTs suggest similar effects (Mills et al., 2011; Chou et al., 2016; Schmidt et al., 2017). Similarly, MR studies do not consistently show detrimental effects of body mass index (BMI) on stroke (Marini et al., 2020). In this study, we explain how potential violations of the instrumental variable assumptions due to inadvertently recruiting survivors of the genetically predicted exposure and competing risk of the outcome may bias MR estimates. We explain how this bias might be corrected using multivariable MR and provide a simple means of estimating how large the bias is likely to be.
Materials and Methods
Potential Biasing Pathways Due to Recruiting on Selective Survival
Figure 1A shows the directed acyclic graph for MR illustrating the instrumental variable assumptions typically referred to as relevance, independence, and exclusion restriction. Relevance is explicitly indicated by the arrow from instrument to exposure. Independence is implicitly indicated by the lack of an arrow from confounders of exposure on outcome (or of instrument on outcome) to instrument. Exclusion restriction is implicitly indicated by the lack of arrows linking instrument to outcome, sometimes illustrated as no arrow from instrument to outcome indicating no pleiotropy (Bowden et al., 2015, 2016; Hartwig et al., 2017; Verbanck et al., 2018) (Figure 1B). Figure 1C shows selection on survival of both instrument and common causes of the outcome (U2) (Hughes et al., 2019; Swanson, 2019), which also violates the exclusion restriction assumption, particularly when stated as “every unblocked path connecting instrument and outcome must contain an arrow pointing into the exposure” (Pearl, 2009). Figure 1D explicitly shows survival on instrument, and another disease (Y2) sharing etiology (U2) with the outcome (Y). Figure 1E shows the exclusion restriction assumption with both no pleiotropy and no selection bias from competing risk (U2) made explicit. Notably, Figures 1C–E are very similar in structure to a well-known example of selection bias, which occurs when conditioning on an intermediate (or covariable adjustment) reverses the direction of effect: the “birth weight” paradox (Hernandez-Diaz et al., 2006). In the birth weight paradox adjusting the association of maternal smoking with infant death for birth weight makes maternal smoking look protective; further adjusting for all common causes of birth weight and infant death, thought to be birth defects, should remove this bias (Hernandez-Diaz et al., 2006) by blocking the path from maternal smoking to infant death via birth weight and birth defects. Similarly, bias due to inadvertently selecting the underlying sample in an MR study on surviving the genetically instrumented exposure and surviving competing risk of the outcome should be ameliorated by adjusting for major causes of survival and the outcome (Figure 2). The recent development of multivariable MR (Sanderson et al., 2019) provides the means to do so. Specifically, as indicated in Figures 1C,D, where univariable MR may be biased, using multivariable MR adjusting for the main determinants of survival and outcome may reduce bias by at least partially blocking any backdoor paths from instrument to outcome.
Figure 1. Directed acyclic graphs with instrument (Z), outcome (Y), exposure (X), confounders (U1), and survival (S), where a box indicates selection, for (A) a valid Mendelian randomization study and (B) a Mendelian randomization study with an invalid instrument through violation of the exclusion-restriction assumption via pleiotropy, (C) a Mendelian randomization study with an invalid instrument through violation of the exclusion-restriction assumption via survival on instrument and shared etiology with the outcome (U2), (D) a Mendelian randomization study with an invalid instrument through violation of the exclusion restriction assumption via survival (S), competing risk of another disease (Y2) and shared causes (U2) with (Y2) and the outcome (Y), and (E) a Mendelian randomization illustrating both conditions which have to be met to satisfy the exclusion restriction assumption.
Figure 2. Directed acyclic graphs showing how selection bias could occur because of selection on survival (S), indicated by a box, on the instrument (GV) and on competing risk of ischemic heart disease (IHD) which shares causes with the outcome of interest, i.e., stroke, with U1 as confounders of exposure and outcome, when assessing (A) effects of an exposure on stroke or AF, (B) effects of lipid modifiers on stroke, and (C) effects of body mass index on stroke.
In addition, to provide triangulation, the level of selection bias due to surviving to recruitment on genetically instrumented exposure in the presence of competing risk of the outcome can also be thought of as depending on the proportion of the exposed who are not available for recruitment because of prior death due to the genetically predicted exposure and the proportion of those who could have experienced the outcome who are not available for recruitment because of prior death from a competing risk. Assuming these proportions are independent and their corresponding probabilities do not sum to more than 1, then for an observed odds ratio (OR) greater than 1, the true OR for genetically predicted exposure on disease can be estimated as the observed OR multiplied by the ratio of the probability of surviving the exposure and the competing risk to the probability of surviving the exposure or the competing risk, as shown in Appendix Table 1.
Examples of Selection Bias and Amelioration
We investigated effects of lipid modifiers and BMI on ischemic stroke as possible exemplars, because previous MR studies of these exposures on stroke have not always given the expected results (Hopewell et al., 2018; Marini et al., 2020). Statins and PCSK9 inhibitors are very well-established interventions for cardiovascular disease, which reduce low-density lipoprotein (LDL)-cholesterol, IHD (Mills et al., 2011; Chou et al., 2016; Schmidt et al., 2017), stroke (Mills et al., 2011; Chou et al., 2016; Schmidt et al., 2017), and atrial fibrillation (AF) (Peng et al., 2018). BMI is also known to be harmful. IHD, stroke, and AF also share major causes independent of lipid modifiers, such as blood pressure (Emdin et al., 2015; Ettehad et al., 2016), smoking, lifestyle, and socioeconomic position. Death from IHD typically occurs at earlier ages than death from stroke in Western populations (Kesteloot and Decramer, 2008; Menotti et al., 2019). AF may also be a consequence of IHD. Figure 2A suggests bias would be expected for harmful exposures on stroke or AF in any sample of survivors, such as middle-aged or older adults. Adjusting for major factors causing survival to recruitment into the underlying studies of stroke or AF, as shown for lipid modifiers on stroke (Figure 2B) or BMI on stroke (Figure 2C), should reduce the bias. As such, univariable MR, even with well-defined genetic instruments free from genetic pleiotropy, might generate biased estimates due to selection bias violating the exclusion-restriction assumption, but appropriate use of multivariable MR might ameliorate the problem.
We used well-established independent genetic variants to mimic effects of statins (rs12916) and proprotein convertase subtilisin/kexin type 9 (PCSK9) inhibitors (rs11206510, rs2149041, and rs7552841) (Ference et al., 2019), and for BMI (96 variants) (Locke et al., 2015). Using two-sample univariable MR, we applied these variants to major GWAS, in people largely of European descent, of IHD (CARDIoGRAMplusC4D 1000 Genomes) (Nikpay et al., 2015), stroke (MEGASTROKE) (Malik et al., 2018), and AF (Nielsen et al., 2018). We also used the UK Biobank summary statistics for IHD and stroke (Zhou et al., 2018), but not for AF because the AF GWAS includes the UK Biobank data (Nielsen et al., 2018). We obtained univariable MR estimates by meta-analyzing the Wald estimates (genetic variant on outcome divided by genetic variant on exposure) using inverse variance weighting, with multiplicative random effects, after aligning variant estimates on the same-effect allele in each study.
We used multivariable two-sample MR to obtain MR estimates for the lipid modifiers on stroke and AF adjusted for major causes of survival (smoking initiation, blood pressure, and BMI) (Forouzanfar et al., 2015; Sakaue et al., 2020) and stroke, and to obtain an MR estimate for BMI on stroke adjusted for smoking initiation. We used published independent genetic instruments for smoking initiation (327 variants) (Larsson et al., 2020), systolic blood pressure (SBP) and diastolic blood pressure (DBP) [all replicated variants (SBP 215, DBP 219)] (Evangelou et al., 2018), and BMI (96 variants) (Locke et al., 2015). Genetic associations, for all the instruments selected, with LDL-cholesterol, ever smoking, SBP, DBP, and BMI, were obtained from the UK Biobank summary statistics1 adjusted for age, sex, age2, sex∗age, and sex∗age2 and the first 20 principal components. We used the MR-Base clump_data R package with r2<0.05 to obtain independent genetic variants across exposures and the MendelianRandomization package to obtain IVW multivariable estimates. Here, we used summary statistics, meaning we assumed linear and homogenous effects for all exposures. We reported the multivariable conditional F-statistic as a measure of instrument strength and the multivariable Q-statistic as a measure of instrument pleiotropy (Sanderson et al., 2019), obtained using the MVMR package (Sanderson et al., 2019). Calculation of the conditional F-statistic and the multivariable Q-statistic requires the covariance between the effects of genetic variants on each exposure or use of non-overlapping samples for the exposure GWAS (Sanderson et al., 2019). Use of summary statistics for the exposures makes it difficult to obtain their covariance, so we largely selected genetic instruments for exposures from non-overlapping samples; however, some overlap exists, for example, the GWAS used to obtain genetic instruments for smoking initiation and blood pressure both included the UK Biobank (as 33 and ∼40% of the sample, respectively) (Forouzanfar et al., 2015; Locke et al., 2015; Evangelou et al., 2018; Larsson et al., 2020; Sakaue et al., 2020). As such, the conditional F-statistic gives a lower bound for strength of the instruments and the modified Q-statistic gives an upper bound on bias from pleiotropy (Sanderson et al., 2019). Notably, in this context, a significant multivariable Q statistic may indicate genetic pleiotropy or violation of the exclusion restriction assumption by selection bias, because both might inflate the multivariable Cochran Q. If the same instruments give very different multivariable Cochran’s Q for the same outcomes in different studies or for related outcomes in the same study, it would suggest that estimates with higher Cochran’s Q are more likely open to selection bias than genetic pleiotropy. We also reported the multivariable MR-Egger intercept which may indicate genetic pleiotropy (Rees et al., 2017).
This study only used publicly available genetic summary statistics, collected with consent, and so does not require ethical approval.
As expected, the cases recruited into the underlying GWAS (Nikpay et al., 2015; Malik et al., 2018; Nielsen et al., 2018) seemed to be youngest for IHD and oldest for AF with stroke somewhere in between (Supplementary Table 1). In univariable MR, genetically mimicking statins or PCSK9 inhibitors reduced IHD, while genetically instrumented BMI increased IHD (Table 1). Estimates were similar using CARDIoGRAMplusC4D 1000 Genomes and the UK Biobank. IHD is not expected to be majorly open to competing risk, so it was not considered further. In univariable MR, genetically mimicking statins or PCSK9 inhibitors was not associated with a lower risk of stroke or AF; some estimates for statins were in the direction opposite to expected (Table 1). In univariable MR, genetically instrumented BMI did not consistently increase stroke but did increase AF (Table 1). Univariable MR estimates for the major causes of survival considered are shown in Supplementary Table 2.
Table 1. Effect of genetically mimicking statins and PCSK9 inhibitors use (Ference et al., 2019) (in effect size of LDL-cholesterol) and BMI (Locke et al., 2015) on IHD using the CARDIoGRAMplusC4D 1000 Genomes based GWAS (Nikpay et al., 2015) and the UK Biobank on all ischemic stroke using MEGASTROKE (Malik et al., 2018) and the UK Biobank and on AF using a study by Nielsen et al. (2018) from univariable Mendelian randomization and from multivariable Mendelian randomization, with genetically mimicked statins and PCSK9 inhibitors adjusted for systolic blood pressure (Evangelou et al., 2018), diastolic blood pressure (Evangelou et al., 2018), smoking initiation (Larsson et al., 2020) and BMI (Locke et al., 2015), and BMI adjusted for smoking initiation.
In multivariable MR, the conditional F-statistics for each exposure were similar in each analysis, suggesting similar instrument strength (Table 1). The Q-statistics were not significant for lipid modifiers on UK Biobank stroke (Table 1). The multivariable MR estimates in the UK Biobank, in contrast to the corresponding univariable MR estimates, showed that genetically instrumented lipid modifiers protected against stroke and that genetically instrumented BMI caused stroke (Table 1). The multivariable MR-Egger intercepts were significant, with largely similar MR-Egger estimates for statins [OR 0.70, 05% confidence interval (CI) 0.56–0.88] and PCSK9 inhibitors (OR 0.66, 95% CI 0.53–0.83) but not BMI (OR 1.00, 95% CI 0.83–1.20). The Q-statistics were highly significant for lipid modifiers and BMI on MEGASTROKE stroke and AF (Table 1), indicating that these estimates were likely still biased by pleiotropy probably from selection bias given the same instruments gave estimates apparently unbiased by genetic pleiotropy for stroke in the UK Biobank. Correspondingly, the multivariable MR estimates were similar to the univariable estimates, and for lipid modifiers differed from those expected from RCTs (Table 1). The multivariable MR-Egger intercepts were not significant for MEGASTROKE estimates or for BMI on AF but were significant for statins and PCSK9 inhibitors on AF. The corresponding multivariable MR-Egger estimates gave directionally similar estimates to the inverse variance weighted estimates for genetically mimicked statins (OR 1.06, 95% CI 0.92–1.23) and PCSK9 inhibitors (OR 1.01, 95% CI 0.87–1.17).
To provide triangulation, we estimated whether the level of selection bias for statins on stroke, from surviving genetically instrument statins and IHD, was consistent with the univariable estimate, using the formula given in Appendix Table 1. The OR for the protective allele of the statin single-nucleotide polymorphism (rs12916) on IHD used to obtain the Wald estimate was 0.96. Assuming statins have the same effect on IHD and stroke, it would only take 10% with that harmful allele and 25% of potential stroke cases to have died from IHD or other competing risks before recruitment into a stroke study for the observed OR to be exactly 1.0, which would give a null MR estimate. If instead 40% of potential stroke cases had died from competing risk before recruitment, then the OR would reverse to 1.04 and give an MR estimate similar to the univariable estimate from MEGASTROKE.
Here, we have shown theoretically, empirically, and analytically that univariable MR studies can be open to quite severe selection bias likely arising from selective survival on genetically instrumented exposure when other causes of survival and outcome exist, i.e., competing risk before recruitment. We have also explained the relevance of this situation to the assumptions of MR, as a violation of the exclusion restriction assumption, how to mitigate this bias using multivariable MR, how to assess the success of this mitigation (using the multivariable Q statistic), and how to make an assessment of the possible level of bias using an approximation based on contextual knowledge (Appendix Table 1). Notably, genetic studies are particularly vulnerable to bias because most genetic estimates are of small magnitude; the closer the true estimate is to the null, the easier it is for a reversal to occur (Appendix Figure 1).
Our study differs from many other studies suggesting that MR is open to selection bias by specifically identifying when such bias can occur in the context of a typical MR study using existing GWAS, and by showing how any such bias may be addressed along with a means of checking whether the bias has been successfully addressed. For participants selected on surviving the genetically instrumented exposure and competing risk of the outcome, our study is similar to other studies about bias in MR in showing that bias can occur from using GWAS summary statistics with “covariable adjustment” (Hartwig et al., 2020). We add by explaining that selecting from the living is common in MR studies and may engender covariable adjustment on survival. Rather than suggesting that such situations should be avoided (Hartwig et al., 2020), precluding MR studies of a harmful exposure on a late-onset disease subject to competing risk, we show how such situations can be addressed. Specifically, external knowledge can be used to identify potential common causes of survival and outcome, followed by multivariable MR to adjust for them and thereby possibly obtain a less biased estimate, bearing in mind the Q statistic. We also show that when, in this situation, it is not possible to adjust comprehensively for factors causing survival and the outcome, the level of potential bias can be estimated (Appendix Table 1). Alternatively, restricting MR studies to younger people will usually reduce bias because death prior to recruitment is less common in younger people. However, these studies may need to consider competing risk after recruitment. Our study also implies that care should be taken in interpreting phenome-wide association studies identifying the effect of a specific genetically instrumented exposure across the phenome, because the effects of harmful exposures observed will vary depending of the level of competing risk of the outcome.
Despite the strengths of our study in explicating and providing means of addressing a relatively common bias in univariable MR, limitations exist. First, use of multivariable MR to address bias arising from sample selection on survival requires knowledge of the underlying causal structure and suitable genetic instruments for all sources of bias. In all observational studies, knowledge of the underlying causal structure is needed to identify potential sources of confounding and selection bias. For example, here our results could also be due to removing the harmful effects of statins and PCSK9 inhibitors via body composition by adjusting for BMI, although these effects are still under investigation (Nelson et al., 2019). Alternative methods to recover from selection bias due to surviving the genetically instrumented exposure and competing risk of the outcome that do not require knowledge of the underlying causal structure or additional data would be easier to use. Second, our study did not conduct simulations of the level of bias. Simulations including research questions with the same underlying directed acyclic graph s as investigated here have been done (Hartwig et al., 2020), and simulation of a similar situation is available (Glymour and Vittinghoff, 2014). The key issue in making use of these simulations is appreciating when these biasing situations might arise and how serious the issues can be in practice, which is the gap addressed by this study. As such, we address appreciating which real-life situations will result in the simulated bias, and what to do to ameliorate it. Third, we provide a means of addressing any such selection bias using multivariable MR (adjusting for common causes of survival and outcome) as well as a means of assessing the likely validity of the revised estimate (non-significant multivariable Q-statistic). However, application and interpretation may not always be straightforward. As with any bias correction by adjustment, it may not be feasible to recover the correct estimate, due to lack of contextual knowledge, a highly interrelated causal structure, such as the genetic instruments causing common causes of survival and outcome, or a lack of relevant information. Fourth, we also provide an approximation to estimate the likely effects of such bias (Appendix Table 1). However, given that the role of selection bias due to death before recruitment from the genetically predicted exposure or from a competing risk of the outcome has rarely been explicitly considered previously, the information needed to identify the sources of bias and estimate the likely level of bias is not easily available. More research concerning the effects of genetic exposures on longevity and the sequence of death from different diseases in different populations would be helpful, as well as easily accessible information about the age and sex structure of participants in genetic studies by case status. Fifth, we do not provide an exhaustive list of examples of when this bias has occurred, because few MR studies have been validated against RCTs. For example, Alzheimer’s disease usually occurs in old age and appears to share causes with determinants of longevity (Deelen et al., 2019), so MR studies of harmful exposures on Alzheimer’s disease could be open to selection bias but the true causes of Alzheimer’s disease are unknown making any determination of whether the MR studies are biased or not difficult. Finally, the issue of obtaining valid estimates in the presence of selective survival on exposure and competing risk of the outcome is similar to the issue of obtaining valid genetic estimates in other studies of survivors, i.e., patients. The current solution for obtaining valid estimates in genetic studies of patients relies on the assumption that the factors causing disease and disease progression differ (Dudbridge et al., 2019). Use of multivariable MR to adjust observational studies in patients suitably might bear consideration.
Specifically, as regards the example here, for the MR estimate for statins on stroke, we were able to recover a plausible estimate in the UK Biobank but not in MEGASTROKE. The UK Biobank participants are younger (∼57 years) than the MEGASTROKE participants (Supplementary Table 1), so the confounders of survival to recruitment and stroke used to adjust for survival could also be more biased by survival in MEGASTROKE making adjustment less effective in MEGASTROKE than in the UK Biobank, possibly as indicated by Supplementary Table 2. In addition, the Q-statistic represents both genetic pleiotropy and pleiotropy due to selection bias, so it is possible that the Q-statistic in MEGASTROKE is larger due to MEGASTROKE having more cases than UK Biobank rather than more severe selection bias, although the same instruments were used in both studies. The conditional F-statistics were quite low for lipid modifiers; however, they did not differ by outcome, so they are unlikely to fully explain the difficulty in fully recovering plausible estimates. The multivariable Q-statistics could also be somewhat larger because some samples used to obtain instruments for the exposures overlapped (Sanderson et al., 2019). However, given the very large Q-statistics for the multivariable estimates for stroke using MEGASTROKE and for AF (Table 1), this overlap is unlikely to affect the interpretation. Finally, the multivariable MR-Egger intercepts were not always significant even when the estimates did not look plausible, perhaps because MR-Egger detects exposure specific directional pleiotropy. In contrast, the multivariable Q-statistic assesses heterogeneity across several exposures which if different due to differing selection bias by exposure could contribute to a larger multivariable Cochran’s Q as well as biased estimates.
Here, we have shown theoretically, empirically, and analytically that univariable MR studies can be open to quite severe selection bias arising from selecting on survival of genetically instrumented exposure when other causes of survival and outcome exist, i.e., competing risk before recruitment. Bias from such selection bias is likely to be least for MR studies of harmless exposures recruited shortly after genetic randomization with no competing risk, i.e., studies using birth cohorts with minimal attrition. Conversely, such bias is likely to be most evident for MR studies recruited at older ages examining the effect of a harmful exposure on an outcome subject to competing risk from shared etiology with other common conditions that occur earlier in life. Use of multivariable MR to adjust for major causes of survival and outcome may ameliorate this bias, while simple sensitivity analysis based on information about the exposure and the natural history of disease may help quantify the magnitude of the bias. Infallible, methods of obtaining valid MR estimates, when the exclusion restriction is invalidated by selection bias stemming from competing risk, that do not require external knowledge, would be helpful.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author/s. This study only uses publicly available R packages to conduct the analysis. The code used to arrange the data for analysis is available on request.
This study only used publicly available genetic summary statistics, collected with consent, and so does not require ethical approval.
CS originated the study concept. PL, SAY, and JH explicated the concepts. JZ and ZY contributed substantially to the analysis, and implementation of the concepts. PL and CS wrote the first draft. SAY and JH contributed to the interpretation and presentation. All authors contributed to drafting and revising the article for intellectual content and approved the final version. All authors are accountable for all aspects of the work.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
This manuscript has been released as a pre-print at biorxiv (Schooling et al., 2020).
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.610852/full#supplementary-material
Bowden, J., Davey Smith, G., and Burgess, S. (2015). Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525. doi: 10.1093/ije/dyv080
Bowden, J., Davey Smith, G., Haycock, P. C., and Burgess, S. (2016). Consistent estimation in mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314. doi: 10.1002/gepi.21965
Chou, R., Dana, T., Blazina, I., Daeges, M., and Jeanne, T. L. (2016). Statins for prevention of cardiovascular disease in adults: evidence report and systematic review for the US preventive services task force. Jama 316, 2008–2024. doi: 10.1001/jama.2015.15629
Dudbridge, F., Allen, R. J., Sheehan, N. A., Schmidt, A. F., Lee, J. C., Jenkins, R. G., et al. (2019). Adjustment for index event bias in genome-wide association studies of subsequent events. Nat. Commun. 10:1561.
Emdin, C. A., Callender, T., Cao, J., and Rahimi, K. (2015). Effect of antihypertensive agents on risk of atrial fibrillation: a meta-analysis of large-scale randomized trials. Europace 17, 701–710. doi: 10.1093/europace/euv021
Ettehad, D., Emdin, C. A., Kiran, A., Anderson, S. G., Callender, T., Emberson, J., et al. (2016). Blood pressure lowering for prevention of cardiovascular disease and death: a systematic review and meta-analysis. Lancet 387, 957–967. doi: 10.1016/s0140-6736(15)01225-8
Evangelou, E., Warren, H. R., Mosen-Ansorena, D., Mifsud, B., Pazoki, R., Gao, H., et al. (2018). Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425.
Ference, B. A., Ray, K. K., Catapano, A. L., Ference, T. B., Burgess, S., Neff, D. R., et al. (2019). Mendelian randomization study of ACLY and cardiovascular disease. N. Engl. J. Med. 380, 1033–1042. doi: 10.1056/nejmoa1806747
Forouzanfar, M. H., Alexander, L., Anderson, H. R., Bachman, V. F., Biryukov, S., and Brauer, M. (2015). Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet 386, 2287–2323.
Glymour, M. M., and Vittinghoff, E. (2014). Commentary: selection bias as an explanation for the obesity paradox: just because it’s possible doesn’t mean it’s plausible. Epidemiology 25, 4–6. doi: 10.1097/ede.0000000000000013
Hartwig, F. P., Davey Smith, G., and Bowden, J. (2017). Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998. doi: 10.1093/ije/dyx102
Hartwig, F. P., Tilling, K., Davey Smith, G., Lawlor, D. A., and Borges, M. C. (2020). Bias in two-sample Mendelian randomization by using covariable-adjusted summary associations. bioRxiv [Preprint] doi: 10.1101/816363
Holmes, M. V., Ala-Korpela, M., and Smith, G. D. (2017). Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat. Rev. Cardiol. 14, 577–590. doi: 10.1038/nrcardio.2017.78
Hopewell, J. C., Malik, R., Valdés-Márquez, E., Worrall, B. B., and Collins, R. Metastroke Collaboration of the Isgc. (2018). Differential effects of PCSK9 variants on risk of coronary disease and ischaemic stroke. Eur. Heart J. 39, 354–359. doi: 10.1093/eurheartj/ehx373
Hughes, R. A., Davies, N. M., Davey Smith, G., and Tilling, K. (2019). Selection bias when estimating average treatment effects using one-sample instrumental variable analysis. Epidemiology 30, 350–357. doi: 10.1097/ede.0000000000000972
Larsson, S. C., Mason, A. M., Bäck, M., Klarin, D., Damrauer, S. M., Program, M. V., et al. (2020). Genetic predisposition to smoking in relation to 14 cardiovascular diseases. Eur. Heart J. 41, 3304–3310. doi: 10.1093/eurheartj/ehaa193
Malik, R., Chauhan, G., Traylor, M., Sargurupremraj, M., Okada, Y., Mishra, A., et al. (2018). Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat. Genet. 50, 524–537.
Marini, S., Merino, J., Montgomery, B. E., Malik, R., Sudlow, C. L., Dichgans, M., et al. (2020). Mendelian randomization study of obesity and cerebrovascular disease. Ann. Neurol. 87, 516–524. doi: 10.1002/ana.25686
Menotti, A., Puddu, P. E., Tolonen, H., Adachi, H., Kafatos, A., and Kromhout, D. (2019). Age at death of major cardiovascular diseases in 13 cohorts. The seven countries study of cardiovascular diseases 45-year follow-up. Acta Cardiol. 74, 66–72. doi: 10.1080/00015385.2018.1453960
Mills, E. J., Wu, P., Chong, G., Ghement, I., Singh, S., Akl, E. A., et al. (2011). Efficacy and safety of statin treatment for cardiovascular disease: a network meta-analysis of 170,255 patients from 76 randomized trials. QJM 104, 109–124. doi: 10.1093/qjmed/hcq165
Munafo, M. R., Tilling, K., Taylor, A. E., Evans, D. M., and Davey Smith, G. (2017). Collider scope: when selection bias can substantially influence observed associations. Int. J. Epidemiol. 47, 226–235. doi: 10.1093/ije/dyx206
Nelson, C. P., Lai, F. Y., Nath, M., Ye, S., Webb, T. R., Schunkert, H., et al. (2019). Genetic assessment of potential long-term on-target side effects of PCSK9 (Proprotein Convertase Subtilisin/Kexin Type 9) inhibitors. Circ. Genom. Precis. Med. 12:e002196.
Nielsen, J. B., Thorolfsdottir, R. B., Fritsche, L. G., Zhou, W., Skov, M. W., Graham, S. E., et al. (2018). Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239.
Nikpay, M., Goel, A., Won, H. H., Hall, L. M., Willenborg, C., Kanoni, S., et al. (2015). A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130. doi: 10.1038/ng.3396
Nitsch, D., Molokhia, M., Smeeth, L., DeStavola, B. L., Whittaker, J. C., and Leon, D. A. (2006). Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am. J. Epidemiol. 163, 397–403. doi: 10.1093/aje/kwj062
Peng, H., Yang, Y., Zhao, Y., and Xiao, H. (2018). The effect of statins on the recurrence rate of atrial fibrillation after catheter ablation: a meta-analysis. Pacing Clin. Electrophysiol. 41, 1420–1427. doi: 10.1111/pace.13485
Rees, J. M. B., Wood, A. M., and Burgess, S. (2017). Extending the MR-Egger method for multivariable Mendelian randomization to correct for both measured and unmeasured pleiotropy. Stat. Med. 36, 4705–4718. doi: 10.1002/sim.7492
Sakaue, S., Kanai, M., Karjalainen, J., Akiyama, M., Kurki, M., Matoba, N., et al. (2020). Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan. Nat. Med. 26, 542–548. doi: 10.1038/s41591-020-0785-8
Sanderson, E., Davey Smith, G., Windmeijer, F., and Bowden, F. (2019). An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48, 713–727. doi: 10.1093/ije/dyy262
Schmidt, A. F., Pearce, L. S., Wilkins, J. T., Overington, J. P., Hingorani, A. D., and Casas, J. P. (2017). PCSK9 monoclonal antibodies for the primary and secondary prevention of cardiovascular disease. Cochrane. Database Syst. Rev. 4:Cd011748.
Schooling, C. M., Lopez, P. M., Yang, Z., Zhao, J. V., Yeung, S. A., and Huang, J. V. (2020). Use of multivariable Mendelian randomization to address biases due to competing risk before recruitment. bioRxiv [Preprint] doi: 10.1101/716621
Smit, R. A. J., Trompet, S., Dekkers, O. M., Jukema, J. W., and le Cessie, S. (2019). Survival bias in mendelian randomization studies: a threat to causal inference. Epidemiology 30, 813–816. doi: 10.1097/ede.0000000000001072
Swanson, S. A., Tiemeier, H., Ikram, M. A., and Hernán, M. A. (2017). Nature as a trialist: deconstructing the analogy between mendelian randomization and randomized trials. Epidemiology 28, 653–659. doi: 10.1097/ede.0000000000000699
Taubes, G. (2018). Researchers find a way to mimic clinical trials using genetics. MIT Technology Review. Available online at: https://www.technologyreview.com/s/611713/researchers-find-way-to-mimic-clinical-trials-using-genetics/ (accessed August 18, 2018).
Valdes-Marquez, E., Parish, S., Clarke, R., Stari, T., Worrall, B. B., Hopewell, J. C., et al. (2019). Relative effects of LDL-C on ischemic stroke and coronary disease: a Mendelian randomization study. Neurology 92, e1176–e1187.
Verbanck, M., Chen, C.-Y., Neale, B., and Do, R. (2018). Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50:1196. doi: 10.1038/s41588-018-0164-2
Zhou, W., Nielsen, J. B., Fritsche, L. G., Dey, R., Gabrielsen, M. E., Wolford, B. N., et al. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341. doi: 10.1038/s41588-018-0184-y
A possible solution for recovering the causal effect in the presence of selection bias due to selecting on surviving the exposure and competing risk of the exposure in a case-control study.
The fundamental issue of the selection bias in a case–control study is unknown information for the “missing” (or unselected) participants. Appendix Table 1 shows the possible mechanism generating a biased causal effect due to selection on surviving the exposure (E) and surviving competing risk (CR) of the outcome (D) in a case–control study.
TABLE A1. Possible mechanism for biased causal effects in a case-control study due to selection bias from surviving the exposure and competing risk of the outcome.
Based on the observed data a′, b′, c′, and d′, the observed causal effect of E on D using an OR (ORobs) is,
To obtain the true causal effect, we have to recover the data for the whole population, i.e., the birth cohorts who formed the population. Let PE denote the proportion of participants unselected due to E, and let PCR denote the proportion of participants unselected due to CR. Suppose PE and PCR are additive, and 0 < PE + PCR < 1. We can construct the pattern of the unselected participants, as shown in Appendix Table 1. As such, the causal effect of E on D for the whole population can be estimated as follows,
This relationship will be invalid if we replace the OR with a risk ratio.
Notably, the level of bias depends on the magnitude of the OR. A small OR, of the order of 1.05, as is typical in a genetic study, is much more vulnerable to a reversal of effect from selection bias due to selecting on surviving the exposure and surviving competing risk of the outcome than a larger OR, of the order of 1.50, as is typical in traditional observational studies. To clarify Appendix Figure 1 shows the observed OR plotted against the true OR for different combinations of selection on survival (PE) and selection on competing risk of surviving the outcome (PCR).
Figure A1. Observed odds ratio against the True odds ratio in the presence of different proportions of death before recruitment due to the exposure (PE) and different proportions of death before recruitment due to competing risk of the outcome (PCR) for true odds ratios large than 1 (left hand side) and smaller than 1 (right hand side, obtained by taking the inverse of the odds ratio).
Keywords: selection bias, competing risk, Mendelian randomization, shared etiology, instrumental variable analysis
Citation: Schooling CM, Lopez PM, Yang Z, Zhao JV, Au Yeung SL and Huang JV (2021) Use of Multivariable Mendelian Randomization to Address Biases Due to Competing Risk Before Recruitment. Front. Genet. 11:610852. doi: 10.3389/fgene.2020.610852
Received: 27 September 2020; Accepted: 01 December 2020;
Published: 15 January 2021.
Edited by:Lei Zhang, Soochow University, China
Reviewed by:Roelof Smit, Icahn School of Medicine at Mount Sinai, United States
Tomas Drgon, United States Food and Drug Administration, United States
Copyright © 2021 Schooling, Lopez, Yang, Zhao, Au Yeung and Huang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: C. M. Schooling, email@example.com