Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis

Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD) are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using (1) set of all SNPs in a gene, (2) set of common SNPs in a gene (MAF ≥ 5%), (3) set of low frequency SNPs (1% ≤ MAF < 5%). For different trait models based on low frequency causal SNPs, we found that combined analysis using all SNPs including common and low frequency SNPs is a good and robust choice whereas using common SNPs alone or low frequency SNP alone can lose power. MLC tests performed well in combined analysis except where two low frequency causal SNPs with opposing effects are positively correlated. Overall, across different sets of analysis, the joint regression Wald test showed consistently good performance whereas other statistics including the ones based on marginal regression had lower power for some situations.

Multi-marker methods for genetic association analysis can be performed for common and low frequency SNPs to improve power. Regression models are an intuitive way to formulate multi-marker tests. In previous studies we evaluated regression-based multi-marker tests for common SNPs, and through identification of bins consisting of correlated SNPs, developed a multi-bin linear combination (MLC) test that is a compromise between a 1 df linear combination test and a multi-df global test. Bins of SNPs in high linkage disequilibrium (LD) are identified, and a linear combination of individual SNP statistics is constructed within each bin. Then association with the phenotype is represented by an overall statistic with df as many or few as the number of bins. In this report we evaluate multi-marker tests for SNPs that occur at low frequencies. There are many linear and quadratic multi-marker tests that are suitable for common or low frequency variant analysis. We compared the performance of the MLC tests with various linear and quadratic statistics in joint or marginal regressions. For these comparisons, we performed a simulation study of genotypes and quantitative traits for 85 genes with many low frequency SNPs based on HapMap Phase III. We compared the tests using (1) set of all SNPs in a gene, (2) set of common SNPs in a gene (MAF ≥ 5%), (3) set of low frequency SNPs (1% ≤ MAF < 5%). For different trait models based on low frequency causal SNPs, we found that combined analysis using all SNPs including common and low frequency SNPs is a good and robust choice whereas using common SNPs alone or low frequency SNP alone can lose power. MLC tests performed well in combined analysis except where two low frequency causal SNPs with opposing effects are positively correlated. Overall, across different sets of analysis, the joint regression Wald test showed consistently good performance whereas other statistics including the ones based on marginal regression had lower power for some situations.

INTRODUCTION
Recently, many multi-marker methods have been developed for the analysis of rare SNPs. Among them, one class of tests is called "collapsing method" or "linear statistics" (Derkach et al., 2013). These statistics combine the individual SNP-based scores linearly with various weights. The cohort allelic sum test (CAST) (Morgenthaler and Thilly, 2007), CMC test (Li and Leal, 2008), the weighted sum test (Madsen and Browning, 2009) are wellknown linear statistics. Linear statistics work well when the combined alleles are mostly deleterious or mostly protective, but when the rare variants include a substantial portion of protective and deleterious effects, they will lose power. The multi-marker tests based on a sum of squared terms are called "quadratic statistics" (Derkach et al., 2013). C-alpha test (Neale et al., 2011), SKAT (Wu et al., 2011) and SSU tests (Pan, 2009) are popular ones in this class, and are usually robust to the occurrence of deleterious and protective variants among multiple associated SNPs. Derkach et al. (2013) evaluated various linear and quadratic statistics and found that linear statistics can be powerful for specific situations but quadratic statistics have robustness to a wide range of trait model scenarios. Both Ladouceur et al. (2012) and Derkach et al. (2013) concluded that there is no single method that is consistently more powerful than other methods.
The multi-marker methods mentioned above are constructed from the marginal association analysis of the trait phenotype with each individual SNP. Alternatively, global statistics can be constructed from joint analysis of multiple SNPs in a multiple regression model. In previous studies, we developed a regression-based multi-marker method that combines linear and quadratic components using bins defined by the linkage disequilibrium (LD) patterns within a gene (Yoo et al., 2013). Regression analysis with multiple SNPs is performed and a global test statistic is constructed from the beta coefficients and associated covariance matrix. The multi-bin linear combination (MLC) statistic takes a weighted linear combination of SNPs effects within a bin of highly correlated SNPs and a quadratic function across bins as a sum of squared within-bin linear combinations. The MLC typically requires an algorithm to adjust the coding of risk and base alleles such that SNPs within a bin are positively correlated, as far as this is possible. In comparison to alternative methods, we found the MLC tests to have relatively good power and robustness under various one and two causal SNP trait models across a wide range of gene structures. Several other multi-marker statistics based on marginal regression analysis such as MinP and SSB (Pan, 2009) also compared in Yoo et al. (2013) showed good power, except for the genes with weakly correlated SNPs (that is, with low LD).
Since MLC is constructed from multi-SNP regression analysis of categorical explanatory variables, we anticipated that the MLC test would be mainly suitable for detecting association with common SNPs, assuming all SNPs, both causal and tagging, are common (MAF ≥ 5%). However, if a large sample size is available, it may be feasible to analyse low frequency variants that have 1% < MAF < 5% with the aim of detecting genes that harbor low frequency causal variants as well as common causal variants. Some multi-marker tests for rare-variant analysis, such as SSB and SSBw (Pan, 2009), can be applied for combined analysis of low and common frequency variants. There are also modified versions of rare variant tests for combined analysis such as SKAT-C (Ionita-Laza et al., 2013), and methods by Chen et al. (2012) and Curtis (2012).
In this study, we compare several gene-based multiple regression association tests including MLC tests under various trait models with low frequency causal variants. We compare different analytic strategies for study of both common and low frequency variants by formulating regression models that analyse common and low frequency SNPs together, common SNPs alone, or rare SNPs alone. We also investigate the conditions in which specific statistics tend to perform better than others.

REGRESSION-BASED FRAMEWORK
When there are multiple SNPs in a gene, multi-SNP analysis can be performed by multiple regression with multi-parameter hypotheses, or alternatively, by combining the results of single-SNP marginal regression analysis. Both approaches require coded genotype data. Here we assume an additive genotype model with the minor allele chosen as risk allele such that the genotype is the count of the minor allele. Suppose that K SNPs in a gene, denoted as X = (X 1 , X 2 , . . . , X K ), have been genotyped and coded as 0, 1, or 2.
The multi-SNP joint regression model of K SNPs is formulated as: is the expected value of quantitative trait Y. Global tests of association based on the regression analysis results are constructed using beta estimatesβ = (β 1 , . . . ,β K ) and covariance estimates B from multi-SNP multiple regression. A Wald test of the global null hypothesis of no association (β j = 0 for all j) against the alternative that at least one β j = 0 is defined as Bβ with an asymptotic null distribution that follows a chi-square distribution with K degrees of freedom (df ).
The maximum value of an individual SNP test statistic can become a global statistic with proper adjustment for multiple testing. This can be done in joint regression analysis with a statistic defined as . Because a simple Bonferroni p-value correction is too conservative due to the correlation between beta estimates arising from the underlying LD in the SNPs, we apply a multiple testing adjustment based on assuming a multivariate normal distribution for the test statistics (James, 1991;Conneely and Boehnke, 2007). The marginal regression models for each of K different single SNPs are formulated as: Global tests of association based on these regressions are constructed with beta estimatesβ = (β M 1 , . . . ,β M K ) from the marginal single-SNP regressions and covariance matrix M B . For the latter, the covariance between marginal beta estimates from individual SNP analyses can be estimated using a GEE-type method as suggested by Pan (2009). As in the multi-marker joint regression, a global Min-P statistic in marginal analyses is are the test statistics for each marginal analysis.

GENE-BASED MULTI-MARKER TEST STATISTICS
As summarized in Table 1, we compared eleven global statistics, based on joint or marginal regression that can be applied to the genotyping data of a set of common and/or low frequency variants. In addition to the Wald and MinP tests defined above, we also consider: (1) MLC-B and MLC-Z tests MLC-B and MLC-Z tests are two related multi-bin multi-marker regression tests, one based on the beta coefficients and the other based on the corresponding Z statistics (Yoo et al., 2013). MLC  Yoo et al., 2013. b Gauderman et al., 2007. c Pan, 2009. d Wu et al., 2011. e Ionita-Laza et al., 2013 tests require construction of bins with high correlation between SNP genotypes within a bin, and low correlation between SNP genotypes in different bins. Suppose L bins have been obtained. Then the MLC-B test is constructed usingβ = (β 1 , . . . ,β K ) and the covariance matrix B with a weight matrix W s and takes the form: where W s = ( −1 B · J)(J T · −1 B · J) −1 and J is a K by L matrix indicating bin assignment of the SNPs, i.e., J ij = 1 if the ith SNP belongs to the jth bin and J ij = 0 if not.
MLC-Z is constructed similarly using the standardized test The asymptotic null distributions of MLC-B and MLC-Z tests are chi-square with L df. The Wald test is a special case of the MLC-B test where J is the K by K identity matrix, which corresponds to each SNP constituting a singleton bin.
(2) LC-B and LC-Z tests At the other extreme, if one bin includes all SNPs in a gene, the MLB test reduces to a linear combination (LC) test. From the definition of MLC-B and MLC-Z, LC-B, and LC-Z tests can be formulated as: The asymptotic null distributions of LC-B and LC-Z tests are chi-square with 1 df.
(3) PC-80 test MLC tests reduce the dimension of testing by summing effects of correlated SNPs. A related method uses principal components of the SNP genotypes as variables in a multiple regression. Here, a gene-based test is constructed from the regression analysis of a subset of principal components (Gauderman et al., 2007), with principal components selected by a criterion of genotypic variance explained. Assuming the principal components are ordered by the size of variance explained from the largest (P 1 ) to smallest (P K ), P 1 , . . . , P S is the smallest set that explains more than 80% of the variance. Then the regression using S principal components is modeled as: Using the estimated beta coefficients of principal componentŝ with an asymptotic null distribution that follows chi-square with S df. When all K of the principal components are included in the regression, the test statistic is the same as the Wald statistic defined above for joint regression.
(4) SSB and SSBw test Pan (2009) proposed quadratic test statistics based on the results of marginal analysis in which squared beta coefficients are summed to form a global test with (SSBw) or without (SSB) weighting by the variance of the beta estimates. The statistics are defined as: which have null distributions that can be approximated by a mixture of independent chi-squared components with 1 df (Pan, 2009).
(5) SKAT The sequence kernel association test (SKAT) proposed by Wu et al. (2011) is a quadratic score test with flexibly devised weights that upweight rare variants. The SKAT statistic is constructed as where Y is the n by 1vector of phenotypes, X is the n by K matrix of genotypes, and the weights are set as w i = {β p i ; 1, 25 } 2 , according to the density function of the beta distribution for the MAF p i of the kth SNP. Asymptotically, the null distribution of SKAT follows a mixture distribution of independent chi-squared components with 1 df.

COMBINED ANALYSIS OF COMMON AND LOW FREQUENCY VARIANTS
To investigate the performance of gene-based tests for combined analysis of common and low frequency variants, we compared three approaches. In the first, we made no distinction between the low frequency and common variants within a gene, analysed all the variants in one multiple regression or multiple single regressions, and constructed global test statistics from all variants combined. Then we repeated analyses separately for the low frequency variants (1% < MAF < 5%) and the common variants (MAF ≥ 5%) within each gene. For the MLC statistics, the bin construction was conducted independently in each of the three analyses. Bins can be determined by any clustering algorithm of SNPs according to the LD measure r. We specified r 2 > 0.5 as the threshold for binning and used the LDSelect algorithm (Carlson et al., 2004) which is a greedy algorithm that constructs clusters beginning with the bigger bins first. Within each bin thus constructed, we applied the coding correction method of Pan (2009) and Wang and Elston (2007). This correction algorithm proceeds sequentially, and switches coding of 0/1 for base and risk alleles if a SNP has too many negative r-values with other SNPs (more than half).
We also adapted the mixture statistic SKAT-C proposed by Ionita-Laza et al. (2013) for combined analysis of rare and common variants: Here each of the SKAT statistics uses a separate set of variants with different weighting schemes: w i = {β(p i ; 1, 25)} 2 for the set of low frequency variants, and w c i = {β(p i ; 0.5, 0.5)} 2 for the set of common variants. The mixture parameter is specified as ϕ = SD(SKAT LF )/{SD(SKAT LF ) + SD(SKAT common )}where SD is the standard deviation of the SKAT statistics. Asymptotically, the null distribution of SKAT-C follows a mixture distribution of independent chi-squared components with 1 df.

INDIRECT ASSOCIATION FOR OMITTED CAUSAL SNPs
In the simulation study which follows below, we assume the causal variants have not been typed and are not included in the joint or marginal regressions. This corresponds, for example, to a GWAS setting with genotyping of common variants supplemented by low frequency variant genotyping that is substantially less dense than sequencing. In this case, the genotyped SNPs in the analysis set are expected to indirectly capture the causal effect, depending on how well they tag the causal variants, i.e., depending on the strength of their relationship with the causal variants. However, the regression coefficients of the genotyped SNPs will be less than that of the unobserved causal variant. In the next paragraphs, we give expressions for the expected values of the beta estimates of the markers included in the multi-SNP regression analysis using an omitted variable bias estimation procedure (Greene, 2000, pp. 334-335). We evaluate these expressions empirically for selected genes from HapMap III under trait models with one or two causal variants, and use the evaluations to help interpret the results of the simulation studies we designed to compare the gene-based test statistics.

Trait model with one causal variant
Suppose that C is the genotype variable of an unobserved causal variant not included in the analysis set of K SNPs with genotypes X = (X 1 , X 2 , . . . , X K ). We assume the true trait model (with a mean Y of zero and a null intercept) is is the vector of expected slope coefficients from the regression model This can be easily shown from the least squares estimation equation forβ * = β 0 ,β 1 , . . . ,β K : where X is the n by (K + 1) genotype matrix including a column for the intercept, Y is the phenotype vector for n subjects, C is the n by 1 genotype vector for the causal SNP, and ε is the residual error vector. Equation (1) For the marginal analysis of each SNP, the indirect association for each SNP in the analysis set is determined for a trait model with one causal variant Y = a 1 C + ε as where ρ Ci is the correlation between the causal SNP and ith SNP in the analysis set, and p C and p i are minor allele frequency (MAF) values of the causal SNP and the ith SNP, respectively. Likewise, σ C and σ i are the standard deviations of the genotype variables for the causal SNP and the ith SNP, respectively.

Trait model with two causal variants
More generally, if the true trait model involves two causal SNPs such that . . , f K ) are the expected slope coefficients in the regression models for each of two causal SNP genotypes where C 1 and C 2 are the n by 1 genotype vectors of the two causal SNPs.
For the same trait model, Y = a 1 C 1 + a 2 C 2 + ε, the expected marginal association is where ρ C 1 i and ρ C 2 i are the correlations between each causal SNP and the ith SNP, p C 1 and p C 2 are MAF values of the causal SNPs, and σ C 1 and σ C 2 are the standard deviations of the genotype variables for the causal SNPs.

SIMULATED DATA AND EMPIRICAL POWER EVALUATION
To evaluate the performance of different gene-based tests, we simulated quantitative trait values and genotypes in 85 gene regions which we identified in HapMap phase III, based on data for 170 individuals in the Asian population. First we excluded SNPs with MAF less than 1% from the HapMap genotype data, and then using a list of 16514 genes across 22 autosomes from the UCSC genome annotation database for NCBI hg18 Build 36.1 (http://hgdownload.soe.ucsc.edu/goldenPath/ hg18/database/), we defined gene regions and constructed bins for each gene using the LDSelect algorithm with the threshold value of r 2 > 0.5. We selected genes with 8-15 SNPs, and required the occurrence of 3 or more low frequency SNPs in the same bin for at least one bin, which yielded 85 genes remaining after all criteria for selection were applied. Here, we categorize SNPs with MAF ≥ 0.05 as common SNPs and SNPs with 0.01 ≤ MAF < 0.05 as low frequency SNPs. The list of genes and the distribution of low frequency and common SNPs are presented in Figure 1. The average of absolute r across 85 genes was 0.37 [95% CI: (0.35,0.39)] and the range was 0.17-0.59. We considered five trait models that differed according to the number of causal SNPs, the frequency category (low frequency or common) of each causal SNP, and the direction of the causal SNP effects ( Table 2). For each gene, we generated genotype data for each of n = 5000 people by randomly pairing haplotypes from the haplotype pool for the phased genotype data of the HapMap Asians. Then the causal SNPs were randomly assigned for each gene based on the conditions for the trait model. The low frequency causal SNPs were selected from the bin of 3 or more correlated low frequency SNPs identified at the stage of gene selection. The common causal SNPs were selected randomly from among all common SNPs. To generate quantitative trait data, we specified an additive model based on allele counts of the causal variant and a normal error model with a specific variance value. We adjusted the variance for each trait model and each gene such that the power of the Wald test is roughly 80%, to improved comparability among the genes and among the trait models. Since we limited the range of standard deviation to between 0.0001 and 100, there were several cases where the 80% power was not achieved. This procedure was repeated for each of the five trait models (that is, new genotypes were generated for each model).
We examined three analysis sets to evaluate the effects of subsetting SNPs based on MAF: (1) set of all SNPs in a gene, (2) set of common SNPs in a gene (MAF ≥ 0.05), (3) set of low frequency SNPs (0.01 ≤ MAF < 0.05). For each SNP set, joint and marginal regression analyses were performed in N = 1000 simulation replicates of 5000 individuals. To characterize the trait models, expected beta coefficients were summarized in various ways and averaged over genes ( Table 3). In each simulation replicate, several gene-based multi-marker methods, including the MLC tests, were applied and compared. These statistics, summarized in Table 1, were chosen to include linear and quadratic statistics based on joint or marginal regression analysis. The empirical type I error and power of each statistic corresponding to a nominal 5% critical value were obtained as the proportion of data sets in which the asymptotic p-value was less than 0.05 among 1000 replicates.

COMPARISONS AMONG ANALYSIS OF THREE SNP SETS
We compared the power of gene-based tests obtained from three analysis sets for each gene panel: (1) set of all SNPs in a gene (common and low frequency), (2) common SNPs only, (3) low frequency (LF) SNPs only. We report the type I error evaluation and empirical power values averaged across the 85 genes with corresponding confidence intervals (Figure 2, Tables 4, 5). Comparison plots of the three analysis sets for most of test statistics (except MLC-Z and LC-Z since they are virtually equal to MLC-B and LC-B) also display the power values for each of the 85 genes (Figures 3-8).
Empirical type I error, averaged over 85 genes was not substantially different from the nominal 0.05 level. (Note that the CIs www.frontiersin.org November 2013 | Volume 4 | Article 233 | 5

FIGURE 1 | The distribution of common and low frequency SNPs for each of 85 genes used for the simulation study.
are constructed from the standard deviation of the gene-specific type I error estimates, so tend to be quite narrow). There was slight elevation of empirical type I error for MinP-M, especially for the analysis of only LF SNPs. This likely reflects an inadequacy of the multivariate normal distribution approximation used for correlated multiple testing (Conneely and Boehnke, 2007). The empirical type I error for trait model 1 was slightly inflated across all tests.
Depending on the trait model, the choice of analysis set affected power differently (Figure 2). For Models 1 and 2, analysis using only LF SNPs was most powerful, while analysing only common SNPs was least powerful, and using the combined set yielded power slightly lower than using LF SNPs alone. In contrast, for Model 3, power was somewhat higher using all SNPs, and lowest for the LF SNPs. For Models 4 and 5, which have one common causal SNP and one LF causal SNP, the combined and  Model 2 Two deleterious low frequency causal SNPs in the same bin a 1 = 1, a 2 = 1 Model 3 Two low frequency causal SNPs, one deleterious and one protective in the same bin a 1 = 1, a 2 = −1 Model 4 One common frequency causal and one low frequency causal SNP, both deleterious a 1 = 1, a 2 = 1 Model 5 One deleterious common frequency causal and one protective low frequency causal SNP a 1 = 1, a 2 = −1 *The trait model is Y = a 1 C 1 + a 2 C 2 + ε where ε ∼ N(0, σ 2 ). σ 2 is specified for each gene such that the power of Wald test is about 0.8. common SNP sets showed similar power in comparison to lower power in the LF set.
Since the causal SNPs in Models 1 and 2 have low frequency and most genes have at least one LF SNP that is strongly correlated with the causal SNPs, the analysis of LF SNPs alone is usually an efficient choice in terms of df and tagging power for causal effect. Although Model 3 also specifies two low frequency causal variants, with the combination of deleterious and protective effects, a 1 = 1 and a 2 = −1, analysis of LF SNPs alone had the lowest power. In this case, a LF SNP that is positively correlated with both causal SNPs will usually appear almost unassociated with the quantitative trait. These observations may be further understood by the expected beta coefficients calculated using equations (1) and (2) ( Table 3). The percentage of strongly associated SNPs (|β| > 0.5) is high for LF SNP analysis in Models 1 and 2,

Frontiers in Genetics | Statistical Genetics and Methodology
November 2013 | Volume 4 | Article 233 | 8  but substantially lower for Model 3. Also, the mean of |β| is higher in the LF SNP analysis compared to all SNP or common SNP analysis in Models 1 and 2, whereas it was lower in Model 3. In Models 4 and 5, however, the two causal SNPs were not required to be within the same bin. So the common causal SNP was more likely to be well-tagged by common SNPs, and analysis of LF SNPs alone had lower power irrespective of whether the LF causal variant was deleterious or protective. The percentages of strongly associated SNPs (|β| > 0.5) in the analysis using all SNPs or common SNPs were both higher for Models 4 and 5 when compared with their counterparts, Models 2 and 3, respectively. However, in the analysis using LF SNPs these percentages were lower for Model 4 compared to Model 2, but higher for Model 5 compared to Model 3, which is consistent with the power results for these models.

COMPARISONS AMONG GENE-BASED TESTS
We compared the performance of gene-based tests for each trait model under the three gene sets analyses. In general, the Wald test was more powerful and robust across different simulation scenarios, while differences in power among the other tests were variable, depending on the scenario (Table 5 and Figures 2-7).
Under Models 1-3 in which the causal SNPs are all LF variants, the Wald test was notably more powerful than other tests when analysing only common SNPs. When we compared the distribution of expected beta coefficients from joint and marginal regression analysis of common SNPs, we found that the percentages of strongly associated SNPs (|β| > 0.5) was high for joint analysis but low for marginal analysis for Models 1-3 (Table 3). Since pairwise correlation is not likely to be strong if the tagging SNP is common and the causal SNP is rare (Figure 8), marginal effects of common SNPs under a LF causal model usually are not strong. Corresponding to these results, the statistics based on marginal analysis of common SNPs such as MinP-M, SSB, SSBw, and SKAT did not perform well for Models 1-3. With the common SNP analysis, the joint regression analysis captured rare causal effects better than marginal analysis, presumably because of the presence of three-way or higher-order LD among the causal SNP and two or more common SNPs.
However, except for the Wald test, the MLC and LC tests based on joint regression did not perform well-under Models 1-3 for common SNPs alone. The sum of β from the joint regression analysis of common SNPs for these models was much smaller than the sum of |β|, suggesting co-occurrence of deleterious and protective  (Table 3). This gives some insight into the underperformance of MLC and LC tests for common SNP analysis for these models even though the joint regression captures the low frequency causal effect to some degree.
For Model 3 where the causal effects are opposing, the empirical power of MLC tests, MinP-M, PC80, SSB, SSBw, and SKAT with analysis of all SNPs was substantially lower than that of the Wald test, whereas for the other trait models, these tests were more powerful than the Wald when the analysis included all SNPs (Figures 2, 5). The expected beta coefficient for the marginal association was low for both common and rare SNPs, which resulted in relatively low power for the tests based on marginal analysis ( Table 3). The joint analysis captured the causal effect better than the marginal analysis for the case of Model 3, but neither of the MLC or LC tests perform well since the captured effects are opposing as indicated by the sum of β near to zero. Model 3 is essentially a "worst case" for the MLC test construction because the opposing LF SNPs are positively correlated and are assigned to the same bin. For Models 4 and 5 where two causals were in different frequency groups, and therefore usually in different bins, MLC tests performed best for the analysis using both common and LF SNPs (Figures 2, 6, 7). This can most likely be explained by reduced df and low prevalence of opposing effects for MLC tests, while the effects of the two causal SNP from both frequency groups are captured well.

DISCUSSION
In this study, we examined the performance of several multimarker methods that can be applied to combined analysis of common and low frequency variants. Using 85 different gene panels which include many low frequency SNPs, we simulated trait models with untyped low frequency causal SNPs. Moreover, by calculating the expected beta estimates of indirect association for joint and marginal regression analysis, we provide some insight into the performance of gene-based statistics in different situations.
In our comparison of different analysis sets of SNPs, we found that combined analysis of low frequency and common SNPs together is a robust choice that works for various trait models whereas analysis using only common SNPs or only low frequency SNPs can lose power in certain situations. The good performance of multi-marker tests using a combined set of SNPs is not surprising when one of the causal SNPs is common and the other is rare. Also, when causal SNPs consist of only low frequency variants, it is natural to expect better performance in analysis of only low frequency SNPs due to smaller df and correlation between typed/analysed SNPs and untyped causal SNPs, but the reduction in power incurred in the combined set of SNPs was rarely very large. Furthermore, for the trait model in which the causal effects are opposing (one deleterious, and one protective), analysis using the combined set of SNPs was a better choice.
Across the different trait models we investigated, the statistic that showed the most robust performance was the Wald test. In our previous study of the MLC and other gene-based statistics, MLC tests, MinP-M, PC80, and SSB tests using common SNPs usually performed better than the Wald test for trait models based on common causal SNPs (Yoo et al., 2013). In this study, the occasional poor performance of statistics based on marginal tests for low frequency causal variant trait model occurred when the marginal regression analysis failed to capture the low frequency SNP effects due to lower correlation with the causal SNP (see Figure 8). The reason for poor performance of MLC tests differed from that of tests based on marginal regression analysis. Since the joint regression analysis was usually better in capturing low frequency causal effects due to multilocus LD, the Wald test performed well. However, the effects captured by multiple SNPs were mostly in opposing directions and since the SNPs had high positive correlation, and fell within the same bin, the MLC tests usually suffered.
The genotypes in our simulation were derived from HapMap haplotypes, and therefore are expected to represent realistic values occurring in an Asian population, at least for more common SNPs. Therefore, the genes selected to include many low frequency correlated SNPs were only 85 in number when we limited the gene size to be between 8 and 15. We expect genotyping data obtained through sequencing study would have a large number of low frequency correlated SNPs and more diversity in gene structure. Further simulation studies based on sequencing data might be needed to address realistic gene structure in a broad sense. Along the same lines, it would be of interest to evaluate the use of imputed SNPs for multi-marker tests. If we could remove or reduce the bias caused by the omitted causal SNPs and use proper global tests for the imputed SNPs, more powerful analysis may be performed.
Many popular multi-marker tests for rare variants are based on marginal analysis, but we were able to confirm the merit of joint regression analysis for certain trait models. Tests based on joint regression analysis are in need of further development. Joint regression analysis is more suitable for combined analysis of common and low frequency variants in a gene-based analysis framework. Also, further study of a combination of gene-based tests having different merits for different situations would be warranted.

AUTHOR CONTRIBUTIONS
Yun Joo Yoo participated in design of research, computational analysis, and drafting of the paper. Lei Sun participated in design of research, interpretation of the results, and revising the paper with critical content. Shelley B. Bull participated in design of research, interpretation of the results, and drafting of the paper.