An Efficient Score Test Integrated with Empirical Bayes for Genome-Wide Association Studies

Many methods used in multi-locus genome-wide association studies (GWAS) have been developed to improve statistical power. However, most existing multi-locus methods are not quicker than single-locus methods. To address this concern, we proposed a fast score test integrated with Empirical Bayes (ScoreEB) for multi-locus GWAS. Firstly, a score test was conducted for each single nucleotide polymorphism (SNP) under a linear mixed model (LMM) framework, taking into account the genetic relatedness and population structure. Then, all of the potentially associated SNPs were selected with a less stringent criterion. Finally, Empirical Bayes in a multi-locus model was performed for all of the selected SNPs to identify the true quantitative trait nucleotide (QTN). Our new method ScoreEB adopts the similar strategy of multi-locus random-SNP-effect mixed linear model (mrMLM) and fast multi-locus random-SNP-effect EMMA (FASTmrEMMA), and the only difference is that we use the score test to select all the potentially associated markers. Monte Carlo simulation studies demonstrate that ScoreEB significantly improved the computational efficiency compared with the popular methods mrMLM, FASTmrEMMA, iterative modified-sure independence screening EM-Bayesian lasso (ISIS EM-BLASSO), hybrid of restricted and penalized maximum likelihood (HRePML) and genome-wide efficient mixed model association (GEMMA). In addition, ScoreEB remained accurate in QTN effect estimation and effectively controlled false positive rate. Subsequently, ScoreEB was applied to re-analyze quantitative traits in plants and animals. The results show that ScoreEB not only can detect previously reported genes, but also can mine new genes.


INTRODUCTION
Genome-wide association studies (GWAS) have become a powerful approach in the genetic dissection of quantitative traits in human, animal and plant genetics (Buniello et al., 2019;Jiang et al., 2019). A number of statistical methods for GWAS have been developed to facilitate the discovery of potentially associated genetic variants. Linear mixed model (LMM) approaches have been widely used due to the capacity to correct genetic relatedness and population structures, thereby minimizing false positives (Zhang et al., 2005;Yu et al., 2006;Aulchenko et al., 2007). Consequently, the number of LMM-based computational tools for genetic studies is rapidly increasing, and includes efficient mixed model association (EMMA) (Kang et al., 2008), a compressed MLM with population parameters previously determined (P3D) (Zhang et al., 2010), factored spectrally transformed linear mixed models (FaST-LMM) (Lippert et al., 2011), genome-wide complex trait analysis (GCTA) (Yang et al., 2011), genome-wide efficient mixed model association (GEMMA) (Zhou and Stephens, 2012), BOLT-LMM (Loh et al., 2015), and the rapid and efficient linear mixed model approach using the score test (LMM-Score) (Chang et al., 2019b). Although these methods have successfully detected a number of variants among various traits, they still have some shortcomings. Most adopt single-locus screening, so that the combined effects of multiple loci are ignored and the threshold in multiple test correction is often difficult to determine (Wang et al., 2016;Ren et al., 2018;Wen et al., 2018).
Several classical approaches have been proposed to address these issues, such as, the least absolute shrinkage and selector operator (Lasso) (Tibshirani, 1996), Elastic-Net (Zou and Hastie, 2005), Bayesian Lasso (Park and Casella, 2008), and Empirical Bayes (Xu, 2010). These approaches have been shown to perform better than single-locus approaches, but most are computationally unfeasible in GWAS. It brings great challenge when the number of predictors is significantly larger than the number of observations. An available solution is to perform dimensionality reduction prior to variable selection. For example, the multi-locus random-SNP-effect mixed linear model (mrMLM) uses the Wald test based on a random-SNP-effect linear mixed model to reduce dimensionality, then, all the selected markers are placed into a multi-locus model, showing advantage in controlling complex population structure (Wang et al., 2016), integration of the Kruskal-Wallis test with Empirical Bayes with polygenic background control (pKWmEB) uses the non-parametric Kruskal-Wallis test to perform initial screening of all SNPs, which is more powerful in the case in which the phenotypic value violates the assumption of a normal distribution (Ren et al., 2018), fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) first chooses all putative quantitative trait nucleotides (QTNs) with p-values ≤ 0.005 and then includes them in a multi-locus model for true QTN detection , iterative modified-sure independence screening EM-Bayesian lasso (ISIS EM-BLASSO) uses an iterative modifiedsure independence screening (ISIS) approach in reducing the number of SNPs to a moderate size, and next estimates all the selected SNP effects in the reduced model (Tamba et al., 2017), hybrid of restricted and penalized maximum likelihood (HRePML) performs restricted maximum likelihood on single-locus LMM to remove unrelated markers, and then carries out penalized maximum likelihood to select true QTN (Ren et al., 2020), a fast Empirical Bayes method (Fast-EB-LMM) uses a modified kinship matrix accounting for individual relatedness to avoid competition between the locus of interest and its counterpart in the polygene (Chang et al., 2019a), and multi-locus mixed-model (MLMM) adopts stepwise mixed-model regression with forward inclusion and backward elimination, and handles the confounding effects of large numbers of loci well (Segura et al., 2012). Although these multi-locus methods have achieved good results in many GWAS analyses, their computational efficiency is not very satisfactory.
Fortunately, the score test can greatly decrease computational time. Furthermore, a major advantage of the score test is that it only requires imputation under the null model of no association, and working within the framework of the score test makes other extensions feasible. Xiong et al. (2002) proposed a generalized Hotelling's T2 test for the analysis of quantitative and qualitative traits, and Wallace et al. (2006) extended it to a marker-based score test for linkage disequilibrium mapping by selective genotyping. To further improve computational efficiency, Tang et al. (2009) developed a principal component-based score test within a variable-sized sliding-window. To incorporate additional phenotypic information of relatives who are not genotyped, Thornton and McPeek (2007) proposed the more powerful quasi-likelihood score (MQLS) test. Uh et al. (2009) extended the MQLS to the genotypic MQLS (gMQLS) test to accommodate different genetic models. However, the aforementioned score tests are unable to estimate quantitative trait nucleotide (QTN) effects, and are weak in controlling confounding.
In this study, we proposed an efficient association analysis approach by integrating the score test with Empirical Bayes, named ScoreEB, under the framework of the mrMLM (Wang et al., 2016) and FASTmrEMMA  approaches. Firstly, the score test is performed to select all of the markers that are potentially associated with the trait, taking into account the genetic relatedness and population structure within the linear mixed model. The mixed model equations were solved using preconditioned conjugate gradient iteration (PCG), which requires only performing matrix-vector products. The PCG algorithm is one of the best known iterative methods for solving linear systems with symmetric, positive definite matrix (Legarra and Misztal, 2008;VanRaden, 2008). Secondly, all of the selected markers are placed into a multi-locus model and their effects are estimated by Empirical Bayes. Then, all of the nonzero effects are further identified by a likelihood ratio test. Our new method ScoreEB adopts the similar strategy of mrMLM (Wang et al., 2016) and FASTmrEMMA , and the only difference is that we use the score test to select all the potentially associated markers. ScoreEB fills the gap between existing multi-locus and single-locus method, and it not only has high computational efficiency, but also can control confounding well. To validate the effectiveness of our method, we compare it with other five methods, mrMLM (Wang et al., 2016), FASTmrEMMA , ISIS EM-BLASSO (Tamba et al., 2017), HRePML (Ren et al., 2020), and GEMMA (Zhou and Stephens, 2012) using a series of simulation studies and real data analysis in plants and animals.

Random QTN Effect Linear Mixed Model
A conventional linear mixed model used for association testing can be expressed as where y denotes an n × 1 quantitative phenotype vector for n individuals; X denotes the n × c fixed effect design matrix, c denotes the number of covariates, including unit vector, population structure (Yu et al., 2006) or principle component (Price et al., 2010), and b denotes their effect sizes including the intercept μ; x denotes an n × 1 genotype vector of the focal QTN, and β ∼ N(0, σ 2 g ) denotes random QTN effect; the variable u is a random vector and can be used to account for additional additive effects, such as polygenic effects and other additive confounding factors, u ∼ MVN(0, Kσ 2 k ) is multivariate normal distribution, σ 2 k denotes the variance component of polygenic effects, K denotes an n × n genetic relatedness matrix; and ε ∼ MVN(0, σ 2 e I n ) denotes independent and identically distributed noise, σ 2 e is residual error variance, and I n is an n × n identity matrix.

Parameter Inference and Score Test for Association Test
For parameter inference, a marginalized form of Model (1) is considered, which is obtained by integrating over the QTN effects β and the polygenic random effect component u Note that various methods of inferring a genetic relatedness matrix have been proposed. In this study, we used a markerinferred genetic relatedness matrix (Price et al., 2010) defined as Here, G (x 1 , x 2 , /, x m ) is the whole genotype matrix, m is the number of markers. Let Σ σ 2 g xx T + σ 2 k K + σ 2 e I n , and Σ is a positive semi-definite symmetric matrix. Now the multivariate normal distribution is The following log likelihood function can be easily obtained where θ (σ 2 g , λ), nuisance parameter λ (b, β, σ 2 k , σ 2 e ). We note that the hypothesis for β, H 0 : β 0, H 1 : β ≠ 0 is equivalent to H 0 : σ 2 g 0, H 1 : σ 2 g > 0. The score test statistics can be computed analogously to the procedure described in Wu et al. (2011).
where we have defined In the model in Eq. 6, the vector b can be estimated via null model maximum likelihood estimation (MLE): The matrix M 0 denotes the total covariance matrix estimated under the null model where σ 2 k and σ 2 e correspond to the null model moment estimation of σ 2 k and σ 2 e (Wu and Sankararaman, 2018). The introduction of the K matrix makes M 0 a dense matrix, which presents a significant challenge in computation. However, we adopt preconditioned conjugate gradient iteration to solve this problem (Legarra and Misztal, 2008;VanRaden, 2008), i.e., computation of expressions of the form M −1 0 y and [M −1 0 X 1 , /, M −1 0 X c ], which can improve computing speed and reduce memory usage, particularly for large individuals. The statistics T score follows the chi-square distribution with one degree of freedom T score ∼ χ 2 1 (10) p-values can be computed via Davies method (Davies, 1980).

Empirical Bayes Estimation for QTN Effects
We conduct variable selection in a multi-locus model where y, X, b and ε are the same as those in Model (1); q is the number of markers selected in single-locus scanning; β i is the random effect for marker i, and x i is the corresponding designed matrix for β i . Obviously, the parameters of interest to be estimated are (β 1 , β 2 , /β q ). Empirical Bayes Xu (2010) was performed to estimate the QTN effects in Model (11). In this method, each QTN effect β i is viewed as random. With the Bayesian hierarchical model, a normal prior is adopted for β i ∼ N(0, σ 2 i ), and the scaled inverse χ 2 prior for σ here, (τ, ω) (0, 0) is used, that is the Jeffrey's prior (Figueiredo, 2003). The following shows the procedure of Empirical Bayes for parameter estimation.

1) Initial-step: Assign initial values to parameters with
2) Expectation-step: QTN effect can be estimated by 3) Maximization-step: Update parameters b, σ 2 e , σ 2 Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 742752 where Repeat 2) and 3) until convergence. All the markers with β i ≤ 10 −4 were excluded in the first step, the likelihood ratio test was then conducted on the estimate of other marker effect β i . Because Empirical Bayes is a multi-locus model, there is no requirement for Bonferroni correction (Wang et al., 2016). Instead of using 0.05/m as a significant threshold, where m is the number of markers, the criterion of logarithm of the odds (LOD) 3.0 was set up (Wang et al., 2016;Ren et al., 2018;Wen et al., 2018). This criterion is frequently adopted in linkage analysis and is the equivalent of P Pr(χ 2 1 > 3.0 × 4.605) ≈ 0.0002, where LOD follows a χ 2 1 distribution and LOD LR/4.605.

Simulation Study
We performed three simulation experiments to validate ScoreEB. In the first simulation experiment, 216,130 SNPs in Atwell et al. (2010) was used as the simulated genotype. The sample size was equal to the number of individuals, that was 199. Six QTNs were simulated and placed on the SNPs with allelic frequencies of 0.30, their heritability was set as 0.10, 0.15, 0.05, 0.05, 0.05, and 0.05, and their positions and effects are listed in Table 1. Three level of heritability (0.05, 0.10 and 0.15) was to investigate the ability of different methods to detect QTNs with different heritability. The differences between our simulation study and previous methods mrMLM (Wang et al., 2016) and ISIS EM-BLASSO (Tamba et al., 2017) were as follows: 1) We used 216,130 SNPs as the simulated genotype rather than employed 10,000 SNP genotypes. If the computational capacity allowed, more SNP markers could reflect the reality. 2). The genotype coding was different. We used 0, 1 and 2 to represent "aa", "Aa" and "AA", however, they used −1, 0 and 1 to represent "aa", "Aa" and "AA". 3). The order of QTNs was sorted based upon the heritability of 0.10, 015, 0.05, 0.05, 0.05 and 0.05 in our study. The SNPs in high LD (linkage disequilibrium) with the assumed QTNs are listed in Table 2.
The phenotype including a polygenic background was simulated by the model y μ is the residual error. The mean value of the phenotype μ was set to 10.0. Here we set residual variance σ 2 e 10.0 and polygenic variance σ 2 k 2.0. With h 2 t σ 2 g /(σ 2 g + σ 2 e + σ 2 k ) 0.05 × 4 + 0.10 + 0.15 0.45, that is σ 2 g /(σ 2 g + 10 + 2) 0.45, total genetic variance σ 2 g and each QTN genetic variance σ 2 i (i 1, /, 6) could be obtained. The heritability of polygenic effect h 2 k is σ 2 k /(σ 2 g + σ 2 e + σ 2 k ) 2/(9.82 + 10 + 2) ≈ 0.092, which is nearly one QTN with heritability 0.10, this can make polygenic effect having a moderate impact. Each QTN true effect can be obtained , where η i denotes the minor allele frequency (MAF), and h 2 i denotes the heritability of each QTN. The simulation experiment was repeated 1,000 times. The false positive rate (FPR) was defined as the ratio between the number of non QTNs wrongly categorized as positive and the total number of actual non QTNs. To evaluate the variance and bias of each QTN effect estimate, the mean squared error (MSE) was calculated. We defined MSE as where R i is the total number of detected ith QTN, i 1, /, 6 is the ith QTN, β is is the estimated effect of QTN i from the sth repeat, and β i is the true effect of QTN i. In the second simulation experiment, the phenotypes without polygenic effect were simulated by the model Other parameters were the same as those in the first experiment, and all the parameters are listed in Supplementary Table S1.
In the third simulation experiment, we intended to investigate the influence of the sample size on the running time. The sample size was set to 200, 500, 1,000, and 2,000, respectively. Meanwhile, the number of markers was fixed at 50,000. And repetition times was set to 100. In addition, five-fold cross-validation test was performed to decide the choice of two hyperparameters in Empirical Bayes step of ScoreEB at different sample sizes (Supplementary Table S2).

Real Datasets
We use previously published datasets from multiple species that includes Arabidopsis thaliana, rice, maize, cattle and pig.
The Arabidopsis thaliana dataset consists of 199 accessions each with 216,130 genotyped SNPs (Atwell et al., 2010), and the phenotype FRI gene expression levels (FRI) is re-analyzed. The  (Romay et al., 2013). The cattle dataset has 5,254 samples with 42,551 genotyped SNPs. The phenotype is milk yield (mkg), which is an important economic trait (Zhang et al., 2015). The pig dataset consists of 4,260 samples each with 47,157 genotyped SNPs, and all SNP markers were mapped to Sus scrofa genome build 11.1. The growth performance related phenotype AGE (days to 100 kg) is re-analyzed (Ramos et al., 2009;Tang et al., 2019). The SNPs with a minor allele frequency (MAF) of 5% or less are filtered out. And the SNPs with missing rate of 20% or more are deleted.

RESULTS
To validate the performance of ScoreEB, three simulation experiments and five real datasets analysis were carried out. Each experiment was analyzed by six methods: a fast score test integrated with Empirical Bayes (ScoreEB), multi-locus random-SNP-effect mixed linear model (mrMLM), fast multi-locus random-SNP-effect EMMA (FASTmrEMMA), iterative modified-sure independence screening EM-Bayesian lasso (ISIS EM-BLASSO), hybrid of restricted and penalized maximum likelihood (HRePML) and genome-wide efficient mixed model association (GEMMA). We performed simulated and real data analysis using six GWAS methods on the same computer (Intel ® Core ™ i9-10855H CPU 2.40 GHz, Memory 64 GB), which has 8 cores and 16 threads. The versions of R and gcc are r-base-4.0.5 and 7.5.0, respectively, based on the Ubuntu 18.04 operating system.

Statistical Power Under Different Levels of FDR and Type I Error
Genetic markers were classified into the ones on QTN-area and non-QTN area to evaluate statistical power under different levels of FDR and Type I error. And 10 3 bp was selected as the window size in our simulation analysis. In the first simulation experiment where six QTN effects and an additive polygenic effect were involved, the area under the Power-FDR curve (AUC.FDR) for ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA methods were 0.4405, 0.4651, 0.4583, 0.4020, 0.4385 and 0.3358, respectively, showing that ScoreEB along with mrMLM and FASTmrEMMA has the similar power, which are significantly higher than GEMMA ( Figure 1A). The power of HRePML and ISIS EM-BLASSO were lower than ScoreEB, while higher than GEMMA. In the second simulation experiment when only six QTN effects were added to the phenotype, the AUC.FDR for the above six methods were 0.4241, 0.4498, 0.4354, 0.3743, 0.3883 and 0.3129, respectively, indicating that the three multi-locus methods ScoreEB, mrMLM and FASTmrMLM still have the higher power than other methods, especially the single-locus method GEMMA ( Figure 1B). And the area under the Power-Type I error curve demonstrated the similar trends ( Figures 1C,D). Clearly, the power of ScoreEB is comparable to that of the other two multilocus methods mrMLM and FASTmrEMMA.

Accuracy for Estimated QTN Effects
We used the mean squared error (MSE) to measure the accuracy of QTN effect estimation. The smaller the MSE, the better the accuracy of the method. We evaluated the accuracies for all of the six simulated QTNs across six methods. In the first simulation experiment, results demonstrated that the average MSEs with ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA were 0.1001, 0.1075, 0.4084, 0.1132, 0.2055 and 5.6487, respectively (Table 1 and Figure 2A). The average MSE of ScoreEB is the minimum. Compared with the average MSE of GEMMA, that of ScoreEB is significantly lower. In the second simulation experiment, results showed the same trend, and the average MSEs of the six methods were 0.0955, 0.1137, 0.3871, 0.1064, 0.1674 and 4.9340, respectively (Supplementary Table S1 and Figure 2B). These results indicate that ScoreEB along with

Rice, Maize, Cattle and Pig
The Arabidopsis data set consists of 199 accessions each with 216,130 genotyped SNPs (Atwell et al., 2010). We re-analyzed flowering time related trait FRI of the Arabidopsis data by ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA. These methods identified 8, 3, 3, 8, 10 and 33 SNPs significantly associated with FRI trait, respectively. We then detected previously reported genes associated with these SNPs via the Arabidopsis website. As a result, 18, 12, 7, 13, 16 and 17 genes were identified by the above six methods respectively, indicating that ScoreEB detected the most previously reported genes. Notably, FLA was detected by all the six methods at the same time ( Table 3, Supplementary Table   S3 and Figure 3A). Previous studies have shown that FLA encodes a major determinant of natural variation in Arabidopsis flowering time. And dominant alleles of FLA confer a vernalization requirement causing plants to overwinter vegetatively. Although GEMMA detected the most SNPs, these SNPs were only associated with 17 genes. Clearly, ScoreEB was more powerful to mine candidate genes than the other methods in analysis of Arabidopsis.
With above six methods, we conducted a genome-wide association study based on genotyped 44,100 SNPs across 413 diverse accessions in 2007 year flowering time at Arkansas of rice data (Zhao et al., 2011). The SNPs significantly associated with flowering time for ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA methods were 8, 7, 3, 3, 8 and 3, respectively. Via analysis of gene ontology annotations, the above six methods detected 28, 24, 10, 10, 23 and 8 associated genes, respectively. There were 4 genes located on chromosome 2 identified by ScoreEB, mrMLM and GEMMA three methods at the same Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 742752 time, and 3 genes located chromosome 1 identified by ScoreEB, mrMLM and HRePML simultaneously ( Table 3,  Supplementary Table S3 and Figure 3B). The results demonstrated that ScoreEB not only detected the most SNPs and associated genes, but also was well consistently with other methods. The maize flowering time measured as days to silk was reanalyzed with the same six methods. The genotype of maize data   Table 3). There were 17 genes or QTLs identified at least by four methods simultaneously (Supplementary Table S3 and Figure 3C). Results showed that ScoreEB was also comparable to HRePML, mrMLM and FASTmrEMMA methods in analysis of maize. In addition to its application to flowering time related traits in plants, we analyzed the quantitative traits of cattle and pig. The cattle data set consists of 5,254 samples each with 42,551 genotyped SNPs (Zhang et al., 2015). In the analysis of milk yield (mkg), ISIS EM-BLASSO and ScoreEB detected the most number of significantly associated SNPs, which were 103 and 90, respectively. And there were 72, 34, 17 and 22 significantly associated SNPs identified by mrMLM, FASTmrEMMA, HRePML and GEMMA. Via analysis of gene ontology annotations, the above six methods detected 71, 63, 57, 30, 14 and 21 associated genes, respectively. It was worth noting that ScoreEB identified the VPS28 gene, which was extremely significant with 241.03 lod value and 2.30 × 10 −243 p value ( Table 3). The VPS28 gene could regulate milk fat synthesis through modulating the ubiquitination-lysosome and ubiquitination-proteasome systems (Liu et al., 2018). Besides VPS28 gene, there was other 8 genes identified by at least four methods simultaneously (Supplementary Table S3 and Figure 3D). These results supported that ScoreEB was effective in cattle application.
Using ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA six methods, we re-analyzed AGE trait in pig based on 47,157 genotyped SNPs 4,260 samples (Ramos et al., 2009;Tang et al., 2019). The number of significantly associated SNPs detected by these methods was 50, 57, 28, 33, 16 and 1, respectively, and the number of identified genes or QTLs around these SNPs was 33, 43, 24, 25, 15 and 1, respectively. There were 6 genes identified at least by four methods simultaneously and ScoreEB detected all these 6 genes, such as, ANKRD6, MALT1 etc., ( Table 3, Supplementary Table S3 and Figure 3E). Meanwhile, ScoreEB and mrMLM identified the most number of associated genes. The single-locus method GEMMA had a poor performance with only 1 gene identified. Results demonstrated ScoreEB was also powerful to mine candidate genes in pig.

Time Complexity
We compared time complexity over M markers and N individuals among the above six methods. In ScoreEB, the time complexity of first stage is O(MN), and that of the second stage is O(tqN 2 ), here, t is the number of iterations required for expectation-maximization (EM) method to converge, q is the number of markers selected in the first stage, and t, q is much smaller than M. The time complexity of mrMLM and FASTmrEMMA in the first stage is difficult to make sure because they call other complicated algorithms, and that of ISIS EM-BLASSO is mainly limit to iterative modifiedsure independence screening (ISIS) step, however, the time complexity of these three methods in the second stage are the same with that of ScoreEB. The time complexity of HRePML is greatly affected by limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. And the time complexity of GEMMA is O(MN 2 ). The multi-locus method ScoreEB along with mrMLM, FASTmrEMMA and ISIS EM-BLASSO is constrained by Empirical Bayes in the second step, ScoreEB has a high computational efficiency when the number of individuals is not very large (n < 3,000).

Observed Running Time
In the first simulation experiment, the dataset consists of 199 individuals and 216,130 single nucleotide polymorphism (SNP) markers with 1,000 replicates. The total running time for ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA methods were 5. 6419, 25.0795, 25.9247, 22.7102, 19.6781 and 17.0846 h, respectively ( Figure 4A). The ScoreEB is the most fast, followed by GEMMA, HRePML, ISIS EM-BLASSO, mrMLM and FASTmrEMMA. Clearly, ScoreEB was about 4 times faster than mrMLM and FASTmrEMMA. However, GEMMA was the second faster at the expense of statistical power and estimating QTN effects. In the second simulation experiment, running time shows a similar trend. ScoreEB only take 5.7727 h, which are significantly faster than mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA with 24.9507, 25.9596, 23.1574, 20.2281and 18.2995 h, respectively ( Figure 4B). Results demonstrate that ScoreEB improves computing efficiency considerably compared with the other five methods.
In the third simulation experiment, the dataset consists of 50,000 markers with 200, 500, 1,000 and 2,000 samples, respectively. And repetition times was set to 100. ScoreEB is always the most fast with 0.3708, 0.7041, 2.6965 and 6.7638 h at different sample size, and GEMMA and HRePML are always the second and the third fast, respectively (Table 4 and Figure 5). With sample size 200 and 500, ScoreEB is much faster than GEMMA and HRePML, and the order of computational efficiency in other three methods is ISIS EM-BLASSO, mrMLM and FASTmrEMMA. At these two sample sizes, FASTmrEMMA is the slowest. When the sample size increases to 1,000 and 2,000, the advantage of ScoreEB over GEMMA in computing speed is becoming less and less. The main reason is that Empirical Bayes is relatively slow to calculate large samples. However, ScoreEB is still much faster than mrMLM, FASTmrEMMA and ISIS EM-BLASSO, although the second step of these four methods are using the same Empirical Bayes. The speed improved of ScoreEB is mainly due to the use of score test and preconditioned conjugate gradient in the first step. At sample size of 1,000 and 2,000, ISIS EM-BLASSO is the slowest with 11.5060 and 39.5193 h, rather than FASTmrEMMA again (Table 4 and Figure 5). The possible reason is that the number of markers retained in the initial screening of ISIS EM-BLASSO is more than that of FASTmrEMMA. In summary, ScoreEB and GEMMA have considerable advantage in computational efficiency.

DISCUSSION
We have shown that the new method ScoreEB can significantly improve computational efficiency by combining the score test and Empirical Bayes within the linear mixed model, compared with popular methods mrMLM (Wang et al., 2016), FASTmrEMMA , ISIS EM-BLASSO (Tamba et al., 2017), HRePML (Ren et al., 2020) and GEMMA (Zhou and Stephens, 2012). In the application of ScoreEB to simulation studies, the proposed approach consistently recorded the higher power. More importantly, it also remained estimation accuracy of QTN effects and effectively controlled the false positive rate ( Table 1, Supplementary Table S1 and Figures 1, 2). Analysis of real Arabidopsis, rice, maize, cattle and pig data, confirmed the effectiveness of ScoreEB, which identified the most candidate genes ( Table 3, Supplementary Table S3 and Figure 3).
With the rapid growth of genomic data, computational efficiency has become a popular research issue. Existing multilocus GWAS methods, such as, mrMLM (Wang et al., 2016), pKWmEB (Ren et al., 2018), FASTmrEMMA  and MLMM (Segura et al., 2012) are all considerably slower than the single-locus method GEMMA. As described in pKWmEB paper, pKWmEB is about 21 times slower than GEMMA, and mrMLM is about 8 times slower than GEMMA. This is an important motivation for developing the multi-locus method ScoreEB. In contrast to the mrMLM method, we adopt a fast score test in initial single-locus scanning, rather than wald test. In the initial screening, we focus on the significant QTN, rather than the estimation of QTN effects, hence, the score test is a more appropriate choice. The score test only requires maximum likelihood estimation (MLE)under the null model (Song et al., 2018). Simulation studies show that ScoreEB is significantly faster than mrMLM, FASTmrEMMA, ISIS EM-BLASSO, HRePML and GEMMA (Table 4 and Figures 4, 5). And HRePML is faster than mrMLM, FASTmrEMMA and ISIS EM-BLASSO, one possible reason is that HRePML is programmed by C++ language, while other methods are developed using R language. Although the runs of the single-locus method GEMMA were slightly faster than the other multi-locus methods, its statistical power was considerably lower than that of other multi-locus methods, as a result of requiring a Bonferroni correction for multiple tests. The significance level for single-locus test is always adjusted by 0.05/m, where m is the number of markers. If multiple tests are not used in single-locus scanning to improve power, the significance level is often difficult to determine, and an inappropriate significant level will increase the false positive rate. ScoreEB provides a good solution to this problem by applying Empirical Bayes in a multi-locus model. LOD 3.0 is set as the significance level, which is widely used in other multi-locus methods (Wang et al., 2016;Ren et al., 2018;Wen et al., 2018). In addition, the new method, ScoreEB,  (Xu, 2010) is one core step on inferring the QTN effects in ScoreEB. We have noticed that the hyperparameters could affect the estimates of QTN effects. To determine the best choice of two hyperparameters (τ, ω), five-fold cross-validation test was performed at sample size 200, 500, 1,000 and 2,000, respectively (Supplementary Table S2). And the setting of hyperparameters is almost the same way as the Xu's paper (Xu, 2010). The MSE is used to evaluate the performance of ScoreEB under various hyperparameter values. And results show that the MSE is minimum when (τ, ω) is set to (0,0) at sample size 200, 500 and 2,000. It means that (τ, ω) (0, 0) (the Jeffrey's prior) is the best choice at these three sample sizes. Only when sample size is 1,000, (τ, ω) (0.5, 0) is the best choice with minimum MSE 0.00167. At this time, the MSE of (τ, ω) (0, 0) is 0.00173, which is slightly larger than that of (τ, ω) (0.5, 0) (Supplementary Table S2). It should be noted that Empirical Bayes is a component of ScoreEB, and the choice of hyperparameters is different from the direct use of Empirical Bayes. Our results demonstrate that (τ, ω) (0, 0) (the Jeffrey's prior) is robust and almost the best at different sample size.
Complex genetic architecture plays a key role in influencing the statistical power, which often leads single-locus methods to perform poorly. However, multi-locus methods can identify and account for complex genetic architectures, such as, allelic heterogeneity, and rare variant architecture (Korte and Farlow, 2013). Interestingly, genetic heterogeneity can lead to a noncausative marker being a better descriptor of the phenotype than a causative one (Platt et al., 2010). One available approach is fitting multiple SNPs in a genomic region into multi-locus mixed model, in this case, it may consider allelic heterogeneity. Another common issue is rare variant architecture, which may not always be resolved by increasing sample size. One solution is to collapse several SNPs in a region into a single indicator variable and use this as a composite genotype (Feng and Zhu, 2012). Therefore, solving complex genetic structure problems is another important motivation to develop ScoreEB.
Although we found that ScoreEB is an efficient and powerful multi-locus method, our approach is not free of limitations. ScoreEB is currently only suitable for analyzing quantitative traits, and is not available for analysis of binary traits. Binary traits are common, for example, stress tolerance in plants and case-control in human beings, and are mostly based on logistic or generalized linear models. ScoreEB detected a small number of genes also identified by the other methods (Supplementary Table  S3 and Figure 3). Although the multi-locus methods ScoreEB, mrMLM, FASTmrEMMA, ISIS EM-BLASSO and HRePML perform relatively well in simulation studies, their consistency in real data analysis is not satisfactory. It is accepted that complementarity exists between different multi-locus GWAS methods (Ren et al., 2018). At present, ScoreEB has a very high computational efficiency, when the number of individuals N is not very large (n < 3,000), such as, most plant researches. For researches with millions of individuals, we recommend BOLT-LMM (Loh et al., 2015) or fastGWA (Jiang et al., 2019). In response to these limitations, we will continue to improve ScoreEB in future work. These improvements will include: 1) Extend the approach to analyze binary trait via a link function. 2) Further explore the issue of fewer identical genes being identified compared to different methods.

CONCLUSION
In this paper, we demonstrated that ScoreEB is a fast and powerful GWAS method for quantitative trait analysis. In addition, ScoreEB has the ability to accurately estimate the QTN effect and effectively control the false positive rate. Using ScoreEB analysis can contribute to increasing our knowledge of the underlying mechanisms of complex traits and to predicting more candidate genes for molecular assisted breeding.