On multi-marker tests for association in case-control studies

Genome-wide association studies (GWAs) have identified thousands of DNA loci associated with a variety of traits. Statistical inference is almost always based on single marker hypothesis tests of association and the respective p-values with Bonferroni correction. Since commercially available genomic arrays interrogate hundreds of thousands or even millions of loci simultaneously, many causal yet undetected loci are believed to exist because the conditional power to achieve a genome-wide significance level can be low, in particular for markers with small effect sizes and low minor allele frequencies and in studies with modest sample size. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) resulting in correlated marker test statistics can be incorporated into multi-marker hypothesis tests, thereby increasing power to detect association. Herein, we establish a theoretical benchmark by quantifying the maximum power achievable for multi-marker tests of association in case-control studies, achievable only when the causal marker is known. Using that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for score test statistics, we develop a set of weights for the markers that maximize the non-centrality parameter, and assess the relative loss of power for other approaches. We find that the method of Conneely and Boehnke (2007) based on the maximum absolute test statistic observed in an LD block is a practical and powerful method in a variety of settings. We also explore the effect on the power that prior biological or functional knowledge used to narrow down the locus of the causal marker can have, and conclude that this prior knowledge has to be very strong and specific for the power to approach the maximum achievable level, or even beat the power observed for methods such as the one proposed by Conneely and Boehnke (2007).


INTRODUCTION
Genome-wide association studies (GWAs) are a prominent approach to search for single-nucleotide polymorphisms (SNPs) associated with disease or other phenotypes. To date, results from more than 1000 GWAs have been reported, identifying over ten thousand DNA loci to be statistically associated with one or more of hundreds of phenotypes investigated (http://www.genome. gov/gwastudies). Typically test statistics and p-values are reported for each marker on the genomic array, and genome-wide significance for a SNP is declared if the p-value after Bonferroni correction is below a threshold for a desired family-wise error rate. Commercially available genomic arrays interrogate the genotypes of individuals at hundreds of thousands or even millions of loci, and p-values less than 5 × 10 −8 are usually required to achieve genome-wide significance. Obviously, these levels of significance are difficult to reach unless the signal is very strong or the sample size is very large. However, the correlation between neighboring markers in the human genome due to linkage disequilibrium (LD) can be incorporated into statistical tests, and thereby increase the power to detect association under the same family-wise error rate.
Reducing test-multiplicity by taking advantage of the observed marker correlation in LD blocks has been a very active field of research. Haplotype-based methods can be an attractive option to decrease the testing burden (Schaid et al., 2002;Chapman et al., 2003), especially in settings where genetic diversity between subjects is low and/or markers are densely typed. However, most approaches avoid the phasing step for haplotype estimation and use the observed genotypes and/or the respective marginal test statistics and p-values instead to generate a single test statistic and p-value for the entire LD block. The approaches most similar to the traditionally employed Bonferroni method are those that estimate the "effective number of tests" based on the correlation structure and use those instead of the actual number of tests to control for the family-wise error rate (Nyholt, 2004;Li and Ji, 2005;Moskvina and Schmidt, 2008). Fisher's inverse chi-square test statistic (Fisher, 1932) is another choice to quantify departure from randomness in a set of multiple p-values. However, for correlated data such as the p-values stemming from the markers in an LD block the inference has to be based on a proper null distribution generated either by permutations (Chapman and Whittaker, 2008) or adjustments to the degrees of freedom in the χ 2 distribution (Makambi, 2003;Chai et al., 2009). Other methods based on the observed genotypes include some traditional multivariate procedures such as Hotelling's T 2 -test (Xiong et al., 2002), principal components analysis (Horne and Camp, 2004;Gauderman et al., 2007) and Fourier transformations (Wang and Elston, 2007), but also concepts borrowed from the statistical learning and regularization literature, such as kernel methods (Schaid et al., 2005;Kwee et al., 2008;Mukhopadhyay et al., 2010;Wu et al., 2010;Pan, 2011), penalized regression (Basu et al., 2011), and the LASSO (Shi et al., 2007;Wu et al., 2009). Further, biostatistical concepts employed include latent variables (Wang et al., 2009a), empirical Bayes methods (Goeman et al., 2006), likelihood ratio tests that simultaneously compare genotype means and variances across cases and controls (Wang et al., 2009b), and even hybrids that combine several of those approaches (Pan et al., 2010). While the above methods are based on the observed data only, other approaches also include additional information such as publicly available data bases (Li et al., 2009) or gene sets and ontologies Chasman, 2008;Holden et al., 2008;O'Dushlaine et al., 2009). The results of some comparisons of multi-marker tests in case-control studies to detect association with SNP sets have been reported, for example by Chapman and Whittaker (2008) and Ballard et al. (2010).
Despite the advances made in methods development for multimarker tests and the additional power that can be gained, the standard approach to analyze GWAs data still is to carry out single marker tests with Bonferroni correction. The somewhat limited use of the novel statistical methods is arguably due in part to the fact that some of these methods can be computationally demanding or that open source software is not always available. However, there are powerful multi-marker tests that are very easy to implement and scale, including the approaches proposed by Seaman and Müller-Myhsok (2005) and Conneely and Boehnke (2007). Both methods are based on marginal score tests for each SNP, and the authors demonstrate how genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for the test statistics, with a variance-covariance derived from the estimates of LD. As an alternative to computationally intensive permutation tests, Seaman and Müller-Myhsok (2005) propose to sample from this multivariate distribution to calculate the statistical significance of an observed test statistic, while Conneely and Boehnke (2007) propose to directly use the multivariate cumulative distribution function to calculate p-values, particularly for the multi-marker test based on the maximum of the absolute values of the observed marginal test statistics. Computational procedures to assess multivariate normal cumulative distribution functions are readily available, for example as implemented in the statistical software environment R (Genz and Bretz, 2009;Genz et al., 2011). Both of these approaches are completely data driven and do not require prior biological knowledge or external reference data.
In what follows, we quantify the maximum power achievable for multi-marker tests to detect association in case-control studies, which relies on the hypothetical assumption that the locus of the causal marker in an LD block is known. Similar to the derivations in Seaman and Müller-Myhsok (2005) and Conneely and Boehnke (2007) we show that genotype correlations within an LD block translate into an asymptotically multivariate normal distribution for the score test statistics, and develop a set of weights for the markers that maximize the non-centrality parameter in the overall test statistic. We assess the relative loss of power of some alternative, data driven, and thus practical methods without such prior knowledge. We also use some simulations to explore the effect on the power that prior knowledge used to narrow down the locus of the causal marker has, and how much of the maximum achievable power it can reach.

SCORE TEST STATISTICS AND CORRELATION STRUCTURES
In a case-control setting we assume the retrospective risk relation to be where x ∈ {0, 1} is the fixed binary disease status indicator, and G is a function of the genotype that specifies the genetic model. In the following we assume that G ∈ {0, 1}, for example encoding a dominant model for a bi-allelic marker (the more general coding is considered in the supplementary materials), but for simplicity still refer to G as the genotype. In this setting G is the random variable with E(G) = π, the relative frequency of G = 1 in the study population. As usual, the parameters μ G and θ G describe the relationship between x and π via the link F, with θ G being the parameter of interest. We denote the disease status indicator for individual i ∈ {1, . . . , n} by x i , and the genotype for individual i by g i . Thus, x i is equal to the number of cases and n − x i is equal to the number of controls. Henceforth, we assume that F is inverse-logit. To test the hypothesis of no genotype/phenotype association at a specific marker H 0 : θ G = 0 (or equivalently, H 0 : π 0 = π 1 = π) we use the score test statistic wherex = 1 n x i andπ =ḡ (introduced for example in Agresti, 2012). In a study with an equal number of cases and controls we havex = 1/2 and thus, the above simplifies to whereπ 1 andπ 0 are the sample means for g in the cases and controls, respectively. Under the null hypothesis θ G = 0, the random variable Z G has mean 0 and variance 1 and its distribution is approximately normal for sufficiently large n.
In addition to G, consider a second marker H and let ξ x = Pr(H = 1|x). As in Equation (2) withξ =h. Setting E(H) = ξ, the conditional distribution of H given G is p h|g = Pr(H = h|G = g), providing a measure of the LD between G and H. Consequently ξ = p 1|0 + π p 1|1 − p 1|0 and In the following, we derive the relation between the correlation of the test statistics Z G and Z H and the correlation between G and H under the null hypothesis (θ G = 0) and local alternatives, and defer the derivations for global alternatives to the supplementary material.
Under the null hypothesis of no association, cov(Z G , Z H ) = E(T G T H ). Using Equations (2) and (4) we have that The last line follows from Assuming that the participants are unrelated and thatπ ≡ π, the above expectation simplifies to which is equal to zero if p 1|0 = p 1|1 = p (no linkage between G and H), and equal to nx(1 −x)π(1 − π) if p 1|0 = 0 and p 1|1 = 1 (perfect linkage). If π and ξ were known then the denominators D G and D H are constants, and thus Thus, under the null hypothesis and local alternatives, subject to the approximation thatπ andξ are constants, the correlation between the test statistics at two markers equals the correlation between the marker genotypes. If there is no correlation between G and H (linkage equilibrium) and H is not independently causal, then H is not associated with disease status. If H is in LD with G, an association with the disease status is induced by G. For a local alternative (θ G = O(1/ √ n)) a first-order Taylor series approximation yields If G is "causal" for the trait of interest but H is not, then the correlation between G and H induces a non-zero θ H in the score test. Specifically, where the last line follows fromξ = p 1|0 +π × (p 1|1 − p 1|0 ). Note that H is induced, so that in case of linkage equilibrium (p 1|1 = p 1|0 ) we have H = 0. Also note that θ H depends on both the odds ratio at the causal marker (θ G ) and the covariation of the genotypes.

MULTI-MARKER TESTS FOR ASSOCIATION
Let Z = (Z 1 , . . . , Z K ) be the score test Z statistics from an LD block with K markers.

The maximum z-statistic Z max
We define the maximum z-statistic as Z max = max 1≤k≤K {|Z k |}. The null distribution of Z max depends on the correlation matrix R of the test statistics Z, and for large samples we have Z ∼ N K (0, R). The two-sided p-value for Z max can be derived from this multivariate distribution by calculating where R is the cumulative distribution function of the multivariate normal distribution with mean vector 0 and correlation matrix R, 1 K is a vector of ones of length K, and the symbol • denotes the dot product.

The Bonferroni corrected p-value
We compute the Bonferroni p-value in a set of K markers as K times the p-value stemming from the most significant marker as given by Z max = max 1≤k≤K {|Z k |}.

The optimal linear combination Z opt
We consider a block of correlated markers as the region of interest and assume that one of these SNPs is biologically associated with the trait of interest. The statistical associations at neighboring markers are thus controlled by the strength of the correlation between the causal marker and other loci in the analysis. With locus-specific Z-scores being approximately normally distributed, a linear combination (L Z) is optimal. Generalizing from the two-locus case to a block of K SNPs with π k = pr( The values B k are known and depend on the minor allele frequency, but in general the θ k are unknown. The optimal linear combination depends on the relative sizes of the k and so an assumption on the relative sizes of the θ k is needed, and certain cases are discussed below and in the next section. We let Z denote the vector of the Z k and denote the vector of the k . We need to identify the K-dimensional vector L opt that maximizes the non-centrality {E(L Z)} 2 = L L subject to L RL = 1, with the correlation matrix R computed from the genotype correlation structure. This is equivalent to finding theL that maximizesL HL subject toL L = 1, where L = R −1/2L and H = R −1/2 R −1/2 . A standard matrix theory result (which can formally be derived using Lagrange multipliers) yields thatL is the normalized first principal component loading vector of H, and we have L opt = R −1/2L so that Z opt = L opt Z.
Note that L opt depends only on the relative sizes of the s. If k ≡ , then L opt = R −1 1/(R −1 ++ ) 1/2 where R −1 ++ is the sum of all entries of R −1 . Further, in the case that all minor allele frequencies in the block are the same, then all B k are identical and is the product of a constant and the row (or column) of R corresponding to the locus which is biologically associated with the trait of interest. (This follows from an extension of Equations (10) and (11) to more than two markers.) In this case, H has a degenerate form with all but one entry (the entry on the diagonal position corresponding to that of the causal locus) equal to 0. This yields that L opt is simply a vector equal to 1 at the causal locus, and zero otherwise, i.e., Z opt = Z causal for sets of markers with equal minor allele frequency. Thus, even though the associations in the remaining markers of the block are only induced by the causal SNP, there is in general more information in the optimal linear combination of score statistics than in the statistic from the causal locus alone, since the minor allele frequencies across a set of markers are virtually always non-identical, unless the markers are in perfect LD (and thus, every marker contains the same information about statistical association with the phenotype).
We also note that the expectation of the optimal linear combination is E(Z opt ) = L opt = ( R −1 ) 1/2 , and thus the noncentrality is If K is very large, care is needed in computing R −1 . However, in most situations either R will be relatively small (limited to the size of an LD block) or will have considerable structure with many zeros and non-communicating subsets and so only matrices of small to medium size will need to be inverted.

The "agnostic" linear combination Z eq
As in the case of the optimal linear combination Z opt above we consider a linear combination of z-scores, albeit without any prior knowledge of the location of any causal variant. This lack of knowledge comes into play in choosing a value of , and we assume a uniform prior over the set of possible causal variants. In this case, the expected non-centrality is is the vector of expected values of the test statistics, assuming that marker k is the causal marker. More specifically, (k) is the k th column of R that has been component-wise multiplied by the B k s. In this case, we proceed as described above to find L opt , with our matrix H given by This approach maximizes the pre-posterior expected noncentrality, although this does not guarantee better performance than Z max .

The sequence kernel association test (SKAT)
For comparison with the above methods, we also include the sequence kernel association test (SKAT), a widely used method for SNP-set analysis based on a logistic kernel-machine approach that allows for flexible, covariate adjusted relations between a genotype and the outcome of interest (Wu et al., 2010). Analyses were carried out using the publicly available SKAT R package (http://cran.r-project.org/web/packages/SKAT) with default settings that produce a linearly weighted kernel, with weights inversely proportional to minor allele frequency.

SIMULATIONS BASED ON ASSUMED LD AND ALLELE FREQUENCIES
We simulated a "naive" population under a dominant disease model using the R package bindata (http://cran.r-project. org/web/packages/bindata). We simulated 5-locus haplotype blocks with exchangeable (compound symmetry, CS) and autoregressive lag-1 (AR1) correlation structures, with correlations between 0 and 0.8. For all markers in the haplotype blocks we chose constant minor allele frequencies in this simulation, set at either 5 or 25%. One causal marker was selected and haplotypes were sampled to generate cases and controls as given by the genetic risk model, using a variety of odds ratios (1, 1.1, 1.4, and 1.7). We generated 50,000 samples of 1000 cases and 1000 controls, and carried out marker-specific score tests to generate sets of test statistics. We investigated the type I error and power for the approach using the maximum z statistic, the optimal linear combination of the test statistics with known causal locus (comb opt ), and the "agnostic" linear combination of the test statistics assuming equal prior probabilities for each marker in the block to be causal (comb eq ). In addition, to mimic some limited biological information available, we show results for a linear combination of test statistics assuming equal prior probabilities for the causal and one additional marker, narrowing the set of potentially causal markers to two out of five (comb pair ). For the compound symmetry simulations each pair of markers that contains the causal one is equivalent. For the auto-regressive lag-1 simulations, we show a pair of markers with correlation ρ and a pair of markers with correlation ρ 2 . Estimates of type I error and power are the fraction of simulations with p-values lower than the set significance level assuming two-sided tests. We also include the results derived for the Bonferroni correction with the significance level divided by the number of markers assessed. In addition to the typical significance level α = 0.05, we also assessed the different methods using a much stricter significance level for type I error control, as is usually done in GWAs. These extreme tail probabilities were estimated using importance sampling (see supplementary material).
With the exception of the conservative Bonferroni correction all methods were well calibrated under the null hypothesis, for both types of correlations (compound symmetry and auto-regressive) and both minor allele frequencies considered (Figure 1). For much stronger type-I error control however all approaches can be slightly conservative, in particular in settings with auto-regressive correlation structure (see supplementary material).
As expected, the optimal linear combination with correctly specified causal locus outperforms all other methods, yielding the largest power for odds-ratios of 1.1, 1.4, and 1.7. For this method, the estimated power was virtually constant for all magnitudes of correlations across markers within a block, for both simulated compound symmetry at low (Figure 2) and high (Figure 3) minor allele frequencies, as well as auto-regressive correlation structures (Figures 4, 5, respectively). Also as expected, the relative loss of power for the other methods is worst for uncorrelated markers, and decreases with increasing correlation. For perfectly correlated markers, all methods except Bonferroni are equivalent. The data-driven maximum z-statistic which does not require any biological knowledge or other input generally performs better than comb eq and Bonferroni, and thus, is a practical and more powerful method than conventionally employed approaches.
Not surprisingly, the relative power of the Bonferroni method is particularly poor for highly correlated markers with compound symmetry when the genetic signal is weak (Figures 2, 3, left column). If the prior information about the locus for the causal variant is not very strong, little if any improvement can be achieved compared to the maximum z-statistic method. In the hypothetical case when the causal locus can be narrowed down to one of two loci in the LD block, comb pair occasionally yields modestly higher power than the maximum z-statistic method, but this is typically only the case when the two markers are in strong LD. However, for large effect sizes and weak correlations in the LD block, comb pair performs at times even worse than multiple comparison correction via Bonferroni, and particularly noticeable at low minor allele frequencies (Figures 2, 4, right columns). As expected, comb pair with ρ always yields higher power than comb pair with ρ 2 in the auto-regressive setting (Figures 4, 5). The performance of comb eq is arguably the worst, and particularly poor in settings with high minor allele frequencies and strong signal (Figures 3, 5, right columns).

SIMULATIONS BASED ON GENOMIC DATA
As realistic examples of LD and minor allele frequencies, we also simulated data based on haplotypes in two regions of chromosome 10 and one region on chromosome 22 delineated from the genome scans (Illumina Human660W-Quad BeadChip array) of 4251 European American participants in the Lung Health Study, a NHLBI-supported multi-center randomized clinical trial in the United States and Canada to determine whether or not a program of smoking intervention and use of an inhaled bronchodilator could slow the rate of decline in pulmonary function in smokers with mild airflow limitation (Kanner et al., 1999).

FIGURE 1 | Calibration under the null hypothesis for compound symmetry (left) and auto-regressive (right) correlation structures, assuming equal minor allele frequencies of 5% (top) and 25% (bottom) in a block of 5 markers.
The fraction of rejected null hypotheses of no association (estimated type I error, y-axis) simulated in 50,000 data sets each is shown as a function of the between-marker correlation (x-axis). Each line represents a different method (see the following figures for details), with the Bonferroni method (red) showing the sharp decline as ρ increases.

www.frontiersin.org
December 2013 | Volume 4 | Article 252 | 5 FIGURE 2 | Comparing analytic strategies in the setting with compound symmetry correlation structures, assuming equal minor allele frequencies of 5% in a block of 5 markers. The fraction of rejected null hypotheses of no association (power, y-axis) in 50,000 simulated data sets is shown as a function of the between-marker correlation (x-axis) for assumed odds-ratios of 1.1 (left), 1.4 (middle), and 1.7 (right). Each row highlights a different method, as labeled along the right-hand side.
On chromosome 10 we chose two smaller blocks of 8 and 7 markers respectively (top of Table 1 with lower minor allele frequencies and weaker LD; top of Table 2 with larger minor allele frequencies and stronger LD). On chromosome 22 we chose a larger block of 24 markers (mean R 2 of 0.35, minor allele frequencies between 0.07 and 0.41, median of 0.28, see supplementary material). This block is part of a previously identified region of strong LD observed in a Caucasian population (Dawson et al., 2002). We used inferred haplotypes in these regions to simulate case-control data sets with varying degrees of association between a hypothetical causal locus and the phenotype. For each block and a set of five different effect sizes (odds-ratios of 1, 1.1, 1.25, 1.4, and 1.7) we generated 50,000 samples of 1000 cases and 1000 controls, and carried out marker-specific score tests to generate sets of test statistics.
In the eight marker block with the lower minor allele frequencies and weaker LD, we observed that all methods were well-calibrated under the null hypothesis (odds ratio equal to 1), with the exception of the conservative Bonferroni correction. The optimal linear combination with correctly specified causal locus again outperformed all other methods, yielding the largest power for odds-ratios of 1.1, 1.25, 1.4, and 1.7 (bottom simulated data sets is shown as a function of the between-marker correlation (x-axis) for assumed odds-ratios of 1.1 (left), 1.4 (middle), and 1.7 (right). Each row highlights a different method, as labeled along the right-hand side.
of Table 1). The data-driven maximum z-statistic which does not require biological knowledge or other input showed a slight loss in power compared to the optimal method, however performed substantially better than any of the other approaches considered, including the hypothetical cases where the causal locus could be narrowed down to one of two loci (comb pair in Table 1), even when those two markers were in somewhat strong LD (correlation of 0.68 between marker 4 and the causal locus 5). In this example, the "paired" approach performed even worse than multiple comparison correction via Bonferroni.
For the seven marker block with the larger minor allele frequencies and stronger LD we observed similar results (bottom of Table 2). However, the hypothetical case where the causal locus could be narrowed down to one of two loci yielded higher power when the two markers were in strong LD (correlation of 0.81 between marker 3 and the causal locus 4). Interestingly, the sequence kernel association test showed very different performances for these two simulation scenarios. While properly calibrated under the null, the power for the simulation on the block of eight markers with overall lower minor allele frequencies and weaker LD was substantially lower than www.frontiersin.org December 2013 | Volume 4 | Article 252 | 7 FIGURE 4 | Comparing analytic strategies in the setting with autoregressive correlation structures, assuming equal minor allele frequencies of 5% in a block of 5 markers. The fraction of rejected null hypotheses of no association (power, y-axis) in 50,000 simulated data sets is shown as a function of the between-marker correlation (x-axis) for assumed odds-ratios of 1.1 (left), 1.4 (middle), and 1.7 (right). Each row highlights a different method, as labeled along the right-hand side.
the power of most of the other methods (Table 1). On the other hand, for the simulation on the block of seven markers with larger minor allele frequencies and stronger LD, the performance was very competitive. One possible explanation for this behavior is the weighting scheme in SKAT -by default, low frequency variants carry higher weights than common variants. In the first setting, the marker with the lowest minor allele frequency (i.e., the highest weight) has only a very simulated data sets is shown as a function of the between-marker correlation (x-axis) for assumed odds-ratios of 1.1 (left), 1.4 (middle), and 1.7 (right). Each row highlights a different method, as labeled along the right-hand side.
weak correlation to the causal, much more variable SNP (ρ = −0.14), while in the second setting all markers have about the same minor allele frequency, and are much more strongly correlated.
The simulation on the 24 marker block yields similar results in general, but some qualitative differences are noteworthy. The optimal linear combination again performs best overall, although for odds ratios of 1.4 the maximum z-statistic shows slightly www.frontiersin.org December 2013 | Volume 4 | Article 252 | 9 higher power (Table 3), likely due to the somewhat fragmented LD across the 24 markers observed in our population (see supplementary figures). As before, the "paired" approach performs well when the two markers are in strong LD (the causal markers 12 and marker 13 have an R 2 of 0.97), and yields unsatisfactory power when the markers are not in strong LD (R 2 of 0.05 between markers 12 and 3). The performance of SKAT again is affected by the distribution of minor allele frequencies in the block and the observed LD. While the causal marker 12 has an appreciable minor allele frequency of 0.28, some other markers show much less variation, and thus receive more weight in the default settings.
Here, the lowest minor allele frequency (MAF = 0.07) is observed for marker 3, which is in very low LD with the causal marker (R 2 of 0.05). Overall, and similar to previous results, the largest gain of power for the optimal linear combination relative to the other methods is seen at lower odds ratios.

DISCUSSION
We evaluated three approaches to controlling multiplicity in GWAs: standard Bonferroni, the correlation-calibrated maximum statistic, and a theoretical benchmark: the optimal linear combination of locus specific test statistics which requires knowledge of the causal locus. Computation of the latter two depends on the correlation among the test statistics; the performance of each  depends on this correlation. We reiterate that the correlation among the test statistics is essentially identical to the biological correlation amongst the genotypes (the LD structure) and this can be estimated. For an additional comparison to the above methods, we included the sequence kernel association test (SKAT), a widely used method for SNP-set analysis based on a logistic kernel-machine approach that allows for flexible, Frontiers in Genetics | Statistical Genetics and Methodology December 2013 | Volume 4 | Article 252 | 10 covariate adjusted relations between a genotype and the outcome of interest. Simulations show that the two correlation-dependent approaches are well-calibrated under the null hypothesis. As expected, unless the correlations are very small, the Bonferroni approach is conservative. In the context of the test Z-scores being well approximated by a multivariate normal distribution, the optimal linear combination dominates all other approaches, but this optimality is quite fragile, depending on having identified the causal locus or one in high LD with it. If the causal locus is poorly selected, our linear combination using "best guess" weights as one example, the properly-calibrated Max statistic often performs better, sometimes by a substantial amount with these relations depending on the magnitude of the correlations, their pattern (compound symmetry or auto-regressive), and the magnitude of the genotype-phenotype association. We do see gains in power using a linear combination where we have narrowed down the set of candidate loci in our block, particularly in the case of very small effect sizes. The haplotype-based simulations produce similar comparisons, but with generally smaller differences amongst the approaches.
Overall, the calibrated maximum method is very effective at maintaining power compared to use of a linear combination. However, when the causal locus is correctly specified, the optimal linear combination can confer a considerable increase in power. Therefore, there is some room for error and we have also provided an approach by which it is possible to specify prior probabilities on the loci and then use the induced, optimal linear combination. As we have shown, if there are two markers in a block that have a higher prior likelihood of being associated with disease (e.g., due to damaging functional prediction), putting higher weights, or all weight, on these will provide robustness to mis-specification of the causal locus, while providing more power than the Max test in some cases, especially for very low effect sizes. However, our equally weighted case is equivalent to giving each locus equal prior probability and its generally poor performance indicates that some focus is needed.
We also found that the sequence kernel association test (SKAT) run with its default values is a very competitive method in settings when LD within a block is strong-which also implies similar minor allele frequencies between markers, as high R 2 values are mathematically not possible between SNPs of very different allele frequencies. On the other hand, the power in the simulations with lower minor allele frequencies and weaker LD was lower for SKAT than the power of both the maximum and the optimal linear combination tests. We conjecture that this is due to the default weighting scheme in SKAT-up-weighting less common variants-while in our simulation the marker with the lowest minor allele frequency and thus the highest weight had only a very weak correlation to the assumed causal marker. Thus, we believe that the default SKAT is most useful for blocks with high LD, and for association tests under the common assumption of higher penetrance for lower allele frequency variants. We also note that SKAT allows for weighted burden tests, which we did not consider in this manuscript.
One challenge for all methods of this type is the dependence on having pre-defined blocks of interest. There are several existing methods for estimating LD-structures (e.g., Stephens et al., 2001;Gabriel et al., 2002;Browning and Browning, 2009) which can be used to identify LD-blocks. Here, we have estimated the LD by computing correlation values of the encoded genotypes using the data set at hand, rather than external databases, which avoids incorporating mis-specified structure due to differences in sample populations compared to an external reference. This is in contrast to the often recommended usage of external data, and it will be informative to investigate in detail the impact of ambiguous LD blocks (such as the 24 marker block from chromosome 22) for any of the considered methods.