Using Genetic Risk Score Approaches to Infer Whether an Environmental Factor Attenuates or Exacerbates the Adverse Influence of a Candidate Gene

Some candidate genes have been robustly reported to be associated with complex traits, such as the fat mass and obesity-associated (FTO) gene on body mass index (BMI), and the fibroblast growth factor 5 (FGF5) gene on blood pressure levels. It is of interest to know whether an environmental factor (E) can attenuate or exacerbate the adverse influence of a candidate gene. To this end, we here evaluate the performance of “genetic risk score” (GRS) approaches to detect “gene-environment interactions” (G × E). In the first stage, a GRS is calculated according to the genotypes of variants in a candidate gene. In the second stage, we test whether E can significantly modify this GRS effect. This two-stage procedure can not only provide a p-value for a G × E test but also guide inferences on how E modifies the adverse effect of a gene. With systematic simulations, we compared several ways to construct a GRS. If E exacerbates the adverse influence of a gene, GRS formed by the elastic net (ENET) or the least absolute shrinkage and selection operator (LASSO) is recommended. However, the performance of ENET or LASSO will be compromised if E attenuates the adverse influence of a gene, and using the ridge regression (RIDGE) can be more powerful in this situation. Applying RIDGE to 18,424 subjects in the Taiwan Biobank, we showed that performing regular exercise can attenuate the adverse influence of the FTO gene on four obesity measures: BMI (p = 0.0009), body fat percentage (p = 0.0031), waist circumference (p = 0.0052), and hip circumference (p = 0.0001). As another example, we used RIDGE and found the FGF5 gene has a stronger effect on blood pressure in Han Chinese with a higher waist-to-hip ratio [p = 0.0013 for diastolic blood pressure (DBP) and p = 0.0027 for systolic blood pressure (SBP)]. This study provides an evaluation on the GRS approaches, which is important to infer whether E attenuates or exacerbates the adverse influence of a candidate gene.


INTRODUCTION
The detection of "gene-environment interactions" (G × E) is important and is even more challenging than the detection of main effects of genes (Greenland, 1983;Hunter, 2005;Aschard, 2016). Although some gene-based G × E methods have been developed (Jiao et al., 2013;Lin et al., 2013;Chen et al., 2014;Lin et al., 2016;, very few G × E findings have reached the genome-wide significance level (i.e., p < 0.05 20,000 = 2.5 × 10 −6 . Because there are ∼20,000 genes across the genome, 2.5 × 10 −6 is the commonly used significance level in genome-wide gene-based analyses) (Chen et al., 2014;. Most evidence of G × E was discovered by candidate gene analyses in which genome-wide association study (GWAS) hits were targeted. For example, physical activity has been found to attenuate the influence of the fat mass and obesityassociated (FTO) gene on obesity risk (Vimaleswaran et al., 2009;Kilpelainen et al., 2011). This means that the association of the FTO risk alleles with obesity measures is weaker in physically active subjects than in physically inactive subjects.
Another example is related to blood pressure levels. The fibroblast growth factor 5 (FGF5) gene is associated with blood pressure levels in Han Chinese (Liu et al., 2011;. Obesity has been reported to exacerbate the adverse influence of FGF5 on blood pressure. That is, the association of the FGF5 risk alleles with blood pressure is stronger in obese subjects than in lean subjects (Li et al., 2015).
A candidate gene usually harbors multiple risk variants. It is of interest to know whether an environmental factor (E) attenuates or exacerbates the effects of these risk variants. However, most G × E methods provide only p-values without any inference for the direction of interaction (Lin et al., 2013(Lin et al., , 2016Chen et al., 2014). In contrast, a genetic risk score (GRS) approach aggregates the effects among an ensemble of single-nucleotide polymorphisms (SNPs) and can indicate whether the GRS interacts synergistically or antagonistically with E. A GRS is a linear combination of risk allele counts, where risk alleles and weights are usually retrieved from large published GWASs or meta-analyses. However, the vast majority of genetic studies have been performed on subjects of European ancestry (Sirugo et al., 2019). Appropriate external GWAS results may not be available for G × E studies in other ethnic populations.
To address this issue, GRS approaches using internal weights have been proposed for pathway-based G × E studies  and GWASs (Lin et al., 2018(Lin et al., , 2020. The internal weights come from marginal effects of SNPs that can be estimated by a multivariate elastic net (ENET) regression .
In this study, we described how to use GRS approaches to infer whether E attenuates or exacerbates the adverse influence of a gene (therefore, GRS here aggregates the effects among an ensemble of SNPs in a gene). Moreover, we compared GRS approaches with the "Set-Based gene-EnviRonment InterAction test" (SBERIA) (Jiao et al., 2013), the "interaction sequence kernel association test" (iSKAT) (Lin et al., 2016), and our previously developed "adaptive combination of Bayes factors method" (ADABF) . These methods were then applied to the Taiwan Biobank (TWB) data.

Genetic Risk Score Approaches
Usually, a GRS combines information of multiple nearly independent SNPs across the genome (Ahmad et al., 2013;Rask-Andersen et al., 2017;Lin et al., , 2020. However, there have been few applications of using GRS in gene-based G × E tests. To have a clear description, we first explain covariate adjustment in the GRS gene-based G × E approach. Suppose there are L SNPs in a gene or a pathway. Let g [·] be the link function, Y i be the phenotype that can be either continuous or binary, G ij be the number of minor alleles (0, 1, or 2, by additive genetic model) at the j-th SNP (j = 1, . . ., L), E i be the environmental factor, X i be the vector of nongenetic covariates such as the age and sex, and the subscript "i" represents data for the i-th subject (i = 1,· · ·, n). In addition to the additive model (by counting the number of minor alleles), G ij can also be coded according to the dominant or recessive genetic model. Some GRS approaches involve SNP selection and then aggregate the information of selected SNPs. Therefore, we leave G ij (j = 1, ..., L) later and first regress Y i on X i by a linear model or a logistic regression model, as follows: (1) (for binary Y i ) be the predicted mean of Y i under model (1). Therefore, the covariate-adjusted phenotype for the i-th subject is ε i = Y i − µ 0i . Subsequently, we regress ε i on G i1 , · · · , G iL , as follows: Let β = β 0 · · · β L be the vector of regression coefficients in model (2). The ordinary least squares (OLS) estimate of β is as follows: where n is the sample size, G is the n × (L + 1) matrix with the i-th row of [1 G i1 · · · G iL ], and ε is the n-length vector of covariate-adjusted phenotypes. Because of linkage disequilibrium (LD), the SNPs in a gene are usually highly correlated with each other. In this situation, G G may be singular and not invertible. Ridge regression (RIDGE) (Hoerl and Kennard, 1970) is used to address the collinearity problem in model (2), where the regression coefficients are estimated by minimizing the residual sum of squares and an l 2 -norm penalty, as follows: The regularization parameter λ controls the amount of shrinkage. When λ is close to 0, β from RIDGE will approximate the β from OLS. When λ is large, β 1 , · · · , β L will approach 0, and model (2) will be reduced to an intercept-only model. Least absolute shrinkage and selection operator (LASSO) was later proposed to estimate regression coefficients by minimizing the residual sum of squares and an l 1 -norm penalty (Tibshirani, 1996) as follows: The regularization parameter λ controls the amount of shrinkage. Moreover, variable selection can be performed by shrinking some β j s to 0 (j = 1, ..., L). Because equation (5) selects SNPs that are marginally associated with the covariate-adjusted phenotype ( ε i , i = 1,· · ·, n), this is called the marginal-association filtering by LASSO. The term "marginal" is used here because the E i and G ij × E i interaction term have not been included in Eq. (5). If β j is shrunk to 0, the j-th SNP is regarded as unassociated with the covariate-adjusted phenotype and will not be used for the construction of a GRS. ENET (Zou and Hastie, 2005) strikes a balance between RIDGE and LASSO by estimating regression coefficients while minimizing the residual sum of squares and a mixture of an l 1 -norm and an l 2 -norm, as follows: The regularization parameter λ controls the amount of shrinkage, whereas α is a penalty weight ranging from 0 (RIDGE) to 1 (LASSO). As suggested by , α = 0.5 is used for ENET throughout this work to achieve an optimal balance between RIDGE and LASSO. Eq. (6) is the marginal-association filtering by ENET. Similar to genetic studies using penalized regression approaches (Waldmann et al., 2013;, we used the R package "glmnet" (Friedman et al., 2010) to obtain β. The optimal values of the regularization parameter λ in Eqs (4-6) were determined by 10-fold cross-validation. GWASs (Waldmann et al., 2013) and pathway-based G × E studies  have recommended choosing the largest λ such that the mean squared error (MSE) is within 1 standard error of the minimum MSE, to avoid selecting too many SNPs in ENET or LASSO. However, most complex traits are polygenic, and a single gene usually explains little phenotypic variation. In our simulations and real data analyses, this criterion usually selected 0 SNPs for ENET or LASSO. Therefore, in gene-based G × E studies, we recommend using the λ that leads to the minimum MSE.
RIDGE, LASSO, and ENET are all techniques for regression models that suffer from multicollinearity. Therefore, the pruning stage to remove SNPs with LD is not required here. After obtaining β from RIDGE, LASSO, or ENET, the GRS of the i-th subject is constructed by GRS i = L j = 1 β j G ij , where G ij is the number of minor alleles (0, 1, or 2) at the j-th SNPof the i-th subject. A positive β j indicates that the minor allele is associated with an increase in phenotype. Therefore, a subject with more copies of the minor allele (more phenotype-increasing alleles) will obtain an increase in his/her GRS . In contrast, a negative β j indicates that the minor allele is associated with a decrease in phenotype, and a subject with more copies of the minor allele (more phenotype-decreasing alleles) will obtain a decrease in his/her GRS . The GRS is then transformed into a Z-score that represents how many standard deviations the GRS is from the mean. The standardized GRS Z-score is denoted as GRS i . A larger GRS is always associated with an increased phenotype.
Afterwards, we fit the following generalized linear model (GLM): where γ G > 0 because a larger GRS is always associated with an increased phenotype. Adding a constraint (γ G > 0) is expected to improve power, although for simplicity we here perform a GLM without this constraint. By testing H 0 : γ Int = 0 vs. H 1 : γ Int = 0, we evaluate whether G × E exists. For continuous traits, each 1 standard deviation (SD) increase in GRS is associated with a γ Int change in phenotype for subjects exposed to E = 1 than for subjects exposed to E = 0. For binary traits, each 1 SD increase in GRS is associated with an odds ratio of exp( γ Int ) for subjects exposed to E = 1 compared to subjects exposed to E = 0. A positive (negative) γ Int indicates that E = 1, or a larger E, exacerbates (attenuates) the adverse influence of a candidate gene.
The whole data can be used to estimate β 1 , · · · , β L (Eqs 4-6) and then to test H 0 : γ Int = 0 vs. H 1 :γ Int = 0 (model 7) without data splitting. Theoretically, β j (j = 1, · · · , L) and γ Int are asymptotically independent under the null hypothesis of no SNP-by-environment interaction, proved by corollary 1 of Dai et al. (2012). A two-stage approach that first filters SNPs by a criterion independent of the test statistic ( γ Int estimated from model 7) under the null hypothesis, and then only uses SNPs that pass the filter, can maintain type I error rates and boost power (Bourgon et al., 2010;Frost et al., 2016). Empirically, our following simulation studies confirmed that GRS using internal weights is a valid approach in the sense that type I error rates match the nominal significance level. Moreover, studies on genes (Jiao et al., 2013;Frost et al., 2016), pathways , and GWASs (Lin et al., 2018) have presented the validity of using internally weighted GRSs to test for G × E.
SBERIA is the first gene-based G × E test based on the GRS concept. The phenotype is first regressed on each SNP separately, while adjusting for covariates such as sex, age, and ancestry principal components (PCs), as follows: (8) Let β j be the estimated regression coefficient of the j-th SNP and p j be the p-value of testing H 0 : β j = 0 vs. H 1 : 1 is an indicator variable with a value of 1 if p j < 0.1 and 0 otherwise, sign β j is either 1 or −1 depending on the sign of β j , and ν is a very small value (e.g., 0.0001). Therefore, SBERIA GRS includes both SNP selection (only SNPs marginally associated with the phenotype are used to construct the GRS) and SNP weighting (only the directions of how SNPs influence the phenotype are used in the GRS). The standardized GRS Z-score is denoted as GRS i . Then, the following GLM is fitted: By testing H 0 :γ Int = 0 vs. H 1 :γ Int = 0, we can evaluate whether G × E exists (Jiao et al., 2013).
There are two fundamental differences between SBERIA and ENET (or LASSO). First, in the marginal-association filtering stage, SBERIA fits L regression models, respectively (Eq. 8), whereas ENET and LASSO fit a multivariate model incorporating L SNPs simultaneously (Eqs 5 and 6). Second, in the model for testing GRS × E, SBERIA includes the main effects of all the L SNPs ( L j = 1 γ G j G ij in Eq. 9), whereas ENET and LASSO incorporate only a GRS term as the aggregated genetic main effects (γ G GRS i in Eq. 7).
We compared the abovementioned GRS tests with iSKAT (Lin et al., 2016) and ADABF . In iSKAT, the following model is considered: where δ Int j is the interaction effect between the j-th SNP and E. Assuming δ Int j s(j = 1,· · ·,L) follow a distribution with a mean of 0 and a variance of τ, the null hypothesis of all δ Int j s = 0 (j = 1,· · ·,L) can be reduced to τ = 0. The iSKAT is a score statistic for testing the variance component, i.e., H 0 :τ = 0 vs. H 1 :τ > 0. The statistic can be referred to as Eq. (6) in Lin et al. (2016). The iSKAT method is regarded as optimal in the class of variance component tests (Lin et al., 2013;Chen et al., 2014). Therefore, we chose iSKAT as the representative of variance component tests.
In ADABF, we consider the following model for the j-th SNP (j = 1,· · ·, L): The SNP × E interaction is of interest, and therefore, H 0 :δ Int j = 0 vs. H 1 :δ Int j = 0 for the j-th SNP (j = 1,· · ·, L). A Bayes factor (BF) was calculated for each SNP × E, where BF = Pr (Data|H 1 )/Pr (Data|H 0 ). A larger BF indicates that the relative evidence in favor of H 1 (SNP × E interaction exists) is stronger. Because the number of SNPs exhibiting interactions with E varies gene by gene, ADABF exhaustively searches for the evidence of G × E interaction by considering the largest BF, combining the largest 2 BFs, combining the largest 3 BFs,. . ., to aggregating all the L BFs in the gene. The significance of G × E interaction is then determined by the efficient sequential resampling procedure (Liu et al., 2016). ADABF was selected for comparison because it was recommended as a powerful and robust gene-based G × E test in a recent study . The iSKAT and ADABF methods do not test for G × E through a GRS term. Therefore, these two methods do not make inference for the direction of G × E.

Simulation Study
To reflect the real LD structures of the human genome, we used the genotypes of 18,424 TWB subjects as our simulation material. Three genes (i.e., TNNT3, RFX3, and FTO) were drawn for simulations, including 48, 95, and 242 SNPs, respectively. It is common to see multiple trait-associated SNPs to be included in the same gene (Willer et al., 2007;Fawcett and Barroso, 2010). Therefore, we randomly specified 4 trait-associated SNPs in each simulation replicate. We considered binary and continuous exposures, respectively. For binary exposures, E was randomly sampled from 1 (exposed) or 0 (non-exposed), with P (E = 1) = 0.2 and P (E = 1) = 0.5, respectively. For continuous exposures, E was randomly selected from the normal distribution with a mean of 0 and a standard deviation of 0.5.
The continuous trait of the i-th subject was simulated as follows: where β G d is the SNP main effect of the d-th trait-associated SNP (d = 1, · · · , 4), β Int d is the effect size of interaction between the d-th trait-associated SNP and E (d = 1,· · ·, D), D is the number of trait-associated SNPs that also exhibit interactions Frontiers in Genetics | www.frontiersin.org with E (D = 4 or 2 in our simulation), and ε i is the random error term following the standard normal distribution.
The binary trait of the i-th subject was simulated as follows: where the intercept β 0 was log(0.1/0.9) = −2.2 or log(0.4/0.6) = −0.4. Y i = 1 represents that the i-th subject was diseased whereas Y i = 0 indicates that he/she was non-diseased. This setting corresponds to a disease prevalence of 10% or 40%. A disease prevalence of 40% is also a reasonable setting for some complex diseases. For example, the worldwide prevalence of hypertension among adults aged ≥25 years was ∼40% (Abebe et al., 2015). The magnitudes of SNP main effects, β G d s (d = 1,· · ·, 4), were uniformly sampled from 0.04 to 0.08 for continuous traits or uniformly sampled from log (1.05) to log (1.15) for binary traits. This simulation setting for binary trait (odds ratios uniformly sampled from 1.05 to 1.15) concurs with broad GWAS findings, where most odds ratios < 1.5 and many odds ratios < 1.2 (Baxter and Jordan, 2012;Hodge and Greenberg, 2016).
Unlike small effect sizes for SNPs, the effect size of E was assumed to be larger, because some E is critical to traits (e.g., regular exercise and dietary habits are important to obesity measures). The magnitude of E, |β E |, was assigned to be 0.3 for continuous traits and log (1.3) = 0.2624 for binary traits, respectively. When evaluating type I error rates, data have to be generated from the null hypothesis of no SNP × E interactions, and therefore we specified all β Int d s = 0 (d = 1,· · ·, D). When assessing power, the magnitudes of SNP × E interaction effects, |β Int d |s (d = 1,· · ·, D), were uniformly sampled from 0.04 to 0.08 for continuous traits or uniformly sampled from log (1.05) to log (1.15) for binary traits. To not favor GRS-based approaches that construct GRSs with SNPs marginal effects, the sampling of β Int d s (d = 1,· · ·, D) was independent of the sampling of β G d s (d = 1,· · ·, 4). Power was evaluated under 14 simulation scenarios listed in Table 1, in which "+" indicates a positive effect and "−" means a negative effect. As summarized by Table 1, we always assumed 4 trait-associated SNPs, but the number of traitassociated SNPs that also exhibited interactions with E could be 2 or 4. This setting mimics the common situation that some traitassociated SNPs may also present interactions with E (Jonsson et al., 2009;Chen et al., 2014;Li et al., 2015).
Scenarios 1-7 describe that a larger E is associated with an increase in trait, whereas scenarios 8-14 represent that a larger E is associated with a decrease in trait. Suppose that a higher trait is linked to a less healthy situation, e.g., Y i = 1 in binary traits. Scenarios 1, 3, 6, 8, 10, and 13 indicate that a larger E exacerbates the adverse effect of a candidate gene, whereas scenarios 2, 4, 7, 9, 11, and 14 present that a larger E attenuates the adverse effect of a candidate gene. Scenarios 5 and 12 are cross-over situations, representing that a larger E exacerbates the adverse effect of 50% of trait-associated SNPs while attenuating the adverse effect of the remaining 50% of trait-associated SNPs.

Application to the Taiwan Biobank Data
The TWB aims to build a research database that integrates the genomic profiles and lifestyle patterns of residents aged 30-70 years in Taiwan . Community-based volunteers provided blood samples and a range of information via a face-to-face interview and physical examination. This study included 20,287 TWB individuals and was approved by the Research Ethics Committee of the National Taiwan University Hospital (NTUH-REC No. 201805050RINB). To remove cryptic relatedness, we used PLINK 1.9 (Purcell et al., 2007) to calculate the genome-wide identity by descent (IBD) sharing coefficients between any two subjects. Similar to many genetic studies (Lowe et al., 2009;Mok et al., 2014;Ombrello et al., 2014), we excluded one subject from each pair with PI-HAT ≥ 0.125, where PI-HAT = Probability (IBD = 2) + 0.5 × Probability (IBD = 1). Through this step, relatives within the third-degree of consanguinity were removed. Finally, 18,424 unrelated subjects (9,093 males and 9,331 females) were remained in the following analysis. Table 2 shows basic characteristics of TWB participants stratified by sex.
The majority of TWB subjects were of Han Chinese ancestry . The Axiom Genome-Wide TWB genotyping array was designed for Han Chinese in Taiwan, which was run on the Axiom Genome-Wide Array Plate System (Affymetrix, Santa Clara, CA, United States). A total of 646,783 autosomal SNPs were genotyped in this TWB array. After removing 51,293 SNPs with genotyping rates <95%, 6,095 SNPs with Hardy-Weinberg test p-values < 5.7 × 10 −7 (WTCCC, 2007), and 1,869 variants with minor allele frequencies (MAFs) <1%, 587,526 SNPs were used to construct ancestry PCs. Because variants with MAF <1% have been excluded, no rare variants were included in our following analyses. Removing variants with MAFs <1% is a commonly used quality control step in genetic association studies, because the chances of errors in genotype calling increase with decreasing MAFs (Goldstein et al., 2012;Coleman et al., 2016).

Type I Error Rates
To evaluate type I error rates, continuous traits and binary traits were simulated according to models (12) and (13), respectively. Scenarios 1 and 8 in Table 1 were simulated, but all β Int d s (d = 1,· · ·, 4) have to be set at 0 in order to evaluate type I error rates. The main effects of SNPs, β G d s (d = 1,· · ·, 4), and the environmental factor, |β E |, have been described in section "Simulation study." Based on 10,000 replications for each scenario, Figure 1 and Supplementary Figure S1 show that all methods were valid in the sense that their type I error rates matched the nominal significance level.

Power of 6 G × E Methods
We compared the power of the 4 GRS-based tests and 2 G × E methods: iSKAT (Lin et al., 2016) and ADABF . The power of the 4 GRS-based tests is defined as the probability of rejecting H 0 :γ Int = 0 (p-value < 0.05) and correctly specifying the sign of γ Int (γ Int can be found from models 7 and 9). The TABLE 1 | The 14 simulation scenarios for power comparison, where "exacerbation" and "attenuation" mean that E = 1 (or a larger continuous E) exacerbates or attenuates the adverse effect of a candidate gene, respectively.  iSKAT (Lin et al., 2016) and ADABF  methods can only provide a p-value for testing G × E, without an inference of how E modifies the genetic effects. Therefore, their power is defined as the probability of rejecting the null hypothesis of no G × E (p-value < 0.05). Figure 2 presents the power based on 1,000 simulation replications under each scenario, for continuous traits and P (E = 1) = 0.2. If E exacerbates the adverse effect of a gene, ENET and LASSO outperform the other methods (scenarios 1, 3, 6, 8, 10, and 13). If E attenuates the adverse effect of a gene, RIDGE is more powerful (scenarios 2, 4, 7, 9, 11, and 14). Moreover, ADABF was the optimal test under the cross-over scenario (scenarios 5 and 12). Among the 4 GRS-based tests, ENET, LASSO, and SBERIA first select SNPs marginally associated with the phenotype. ENET and LASSO select SNPs according to the multivariate ENET regression (Eq. 6) and multivariate LASSO regression (Eq. 5), respectively. SBERIA selects SNPs based on regressions considering one SNP at a time (Eq. 8). The performance of these three methods to detect G × E depends on their ability to find the true trait-associated SNPs. Supplementary Figures S2, S3 summarize the sensitivity (SEN) and positive predictive value (PPV) of the marginal-association filtering in ENET, LASSO, and SBERIA. As described in Eq. (12) and Table 1, 4 trait-associated SNPs were randomly assigned in each simulation replication. SEN is defined as the percentage of true positives among the 4 trait-associated SNPs, whereas PPV is defined as the percentage of true positives among the total findings.
Because SBERIA takes a liberal p-value threshold of 0.1, in the filtering stage it usually selects more SNPs than ENET and LASSO do. Therefore, as shown in Supplementary Figures S2, S3, SBERIA generally has a higher SEN but a lower PPV, compared with ENET and LASSO. For each method, SEN was much lower in attenuation scenarios than in exacerbation scenarios. This means that pinpointing true trait-associated SNPs is more difficult in attenuation scenarios. In the filtering stage (Eqs 5, 6, and 8), G id × E i cannot be included; therefore, a β Int d in the opposite direction to β G d will weaken the magnitude of the marginal effect of the d-th trait-associated SNP. In contrast, a β Int d in the same direction to β G d will strengthen the magnitude of the marginal effect of the d-th trait-associated SNP. Therefore, for each method, the ability to pinpoint true trait-associated SNPs is inferior in attenuation scenarios than in exacerbation scenarios (as shown in Supplementary Figures S2, S16 for continuous traits and binary traits, respectively).
Because there is no SNP selection in RIDGE, true traitassociated SNPs are all reserved for constructing GRS. Therefore, RIDGE is the optimal GRS-based test in attenuation scenarios (Figure 2). In exacerbation scenarios, ENET and LASSO are the best two methods because of their superior ability in finding trait-associated SNPs (Supplementary Figures S2, S16).
We then investigated the percentage of sign-misspecifications for each GRS-based method. In the presence of G × E, we calculated the percentage of wrongly specifying the sign of γ Int (in models 7 or 9) among all rejections of H 0 :γ Int = 0 (p-value < 0.05). With 1,000 simulation replications under each scenario, Figure 3 presents the percentages of signmisspecifications, for continuous traits and P (E = 1) = 0.2. The true γ Int is positive under scenarios 1, 3, 6, 8, 10, and 13; negative under scenarios 2, 4, 7, 9, 11, and 14. The cross-over scenarios (5 and 12) were not considered here, because the true sign of γ Int was unclear when E = 1 (or a larger continuous E) exacerbates the adverse effect of 50% of trait-associated SNPs but attenuates the adverse effect of the remaining 50% of traitassociated SNPs. Figure 3 shows that the signs of γ Int were usually correctly specified except scenarios 4 and 11. SBERIA generally makes more mistakes in indicating the true direction for γ Int , because it filters SNPs by considering only one SNP at a time (Eq. 8). Supplementary Figures S4-S7 present the power and percentages of sign-misspecifications in continuous-trait simulations, given P (E = 1) = 0.5 and a continuous E, respectively. Each method under P (E = 1) = 0.5 was slightly more powerful than that under P (E = 1) = 0.2, but the comparisons across methods were similar to those shown in Figures 2, 3.
The performance of each method for binary-trait simulations can be found in Supplementary Figures S8-S21. Regarding different prevalence values, each method was more powerful under P (Y = 1) = 0.4 than under P (Y = 1) = 0.1. Concerning different probabilities of being exposed, each method was more powerful under P (E = 1) = 0.5 than under P (E = 1) = 0.2. Nonetheless, the comparisons across methods were similar to those described for continuous traits.
Regarding the 14 simulation scenarios listed in Table 1, only 2 SNP × E interactions (rather than 4) were simulated in scenarios 4 and 11, and β Int s were in the opposite direction to β G s. Therefore, detecting G × E and correctly specifying the sign for γ Int were more challenging under scenarios 4 and 11.
Regarding the computation time (Supplementary Figures  S22-S29), SBERIA is the fastest method, followed by the three penalized regression methods: RIDGE, ENET, and LASSO. ADABF uses the sequential resampling procedure (Liu et al., 2016), and therefore, it takes a longer time in the presence of G × E. A recent study concluded that ADABF is a computationally feasible method in a GWAS context . This is because in a real GWAS usually very few genes can be detected to interact with E. Therefore, a genome-wide analysis using ADABF takes a much shorter time than the power evaluation performed here (power is always evaluated in the presence of G × E). On average, the iSKAT method requires the longest time in most situations (Supplementary Figures S22-S29).

FTO × Exercise Interaction on Five Obesity Measures
The FTO gene is obesity-associated and has been replicated by many studies (Frayling et al., 2007;Scuteri et al., 2007;Fawcett and Barroso, 2010). It spans from 53, 737, 875 to 54, 148, 379 base pairs on chromosome 16, according to the human genome GRCh37/hg19 assembly. Similar to many gene-based tests (Liu et al., 2010;Wray et al., 2012;Alonso-Gonzalez et al., 2019;, 50 kb in the 3 and 5 regions that might regulate a gene were also incorporated in our analysis. Specifying a large boundary would be difficult to pinpoint the exact gene interacting with E, whereas a small boundary may not fully capture the regulatory regions (Liu et al., 2010). A total of 242 SNPs (all with MAF > 1%) were included in this chromosomal region. Here, we used the 6 G × E tests to investigate whether the influence of FTO can be modified by performing regular physical exercise. A total of five obesity measures were analyzed: body mass index (BMI), body fat percentage (BFP), waist circumference (WC), hip circumference (HC), and the waist-to-hip ratio (WHR).
Regular physical exercise was defined as engaging in 30 min of "exercise" three times a week. "Exercise" indicated leisure-time activities such as jogging, yoga, mountain climbing, or playing basketball. According to the TWB questionnaire, activities during work were not counted in "exercise." Among the 18,424 subjects, 7,652 (41.5%) reported performing regular exercise, while 10,764 reported no regular exercise. A total of eight subjects did not respond to this question. Covariates adjusted in all models included sex, age (in years), educational attainment, drinking status, smoking status, and the first 10 ancestry PCs. Table 3 presents the results of six methods. We may miss possibly important findings due to a harsh penalty of multiple testing. Because this study focuses on G × E detection for candidate genes, we set the significance level at 0.05. The results of RIDGE show that regular physical exercise attenuates the adverse effect of the FTO gene on BMI, BFP, WC, and HC. The other three GRS-based tests (ENET, LASSO, and SBERIA) provided significant results for three obesity measures, respectively. All γ Int s were negative, indicating that regularly performing exercise attenuates the adverse influence of FTO on obesity measures.
Genetic risk score-based tests are more powerful than iSKAT and ADABF because performing regular exercise generally blunted the effects of the trait-increasing alleles in FTO. Take BMI-associated SNPs as examples. A total of 12 out of the 242 SNPs reached the genome-wide significance (p-value < 5 × 10 −8 ) when testing H 0 : β j = 0 vs. H 1 :β j = 0 in the following model: Where Y i is BMI, G ij is the number of minor alleles (0, 1, or 2) at the j-th SNP (j = 1, . . ., 242), and X i is the vector of covariates of the i-th subject (i = 1,· · ·, 18, 424). All β j s of the 12 genomewide significant SNPs were positive, representing that their minor alleles were associated with increased BMIs. When exercise (E i = 1 if yes; E i = 0 if no) and the SNP × exercise interaction were included in the 12 models, as follows: all β j s were positive, and all β Intj s were negative (j = 1,· · ·, 12). This indicated that the minor alleles of the 12 SNPs were associated with increased BMIs, but their effects were blunted by performing regular exercise. As mentioned above, a β Intj in the opposite direction to β j weakens the magnitude of marginal effect of this SNP, and therefore β j from model (15) > β j from model (14). This is a situation similar to our simulation scenario 9 in Table 1, because exercise is associated with a decrease in obesity measures. According to Figure 2 (scenario 9, 242 SNPs), RIDGE is the most powerful test. In real data, the FTO gene harbors more BMI-associated SNPs than our simulation scenario 9, and these SNPs interact with E in the same direction. Therefore, GRS-based tests are more powerful than iSKAT and ADABF.  According to RIDGE, each 1 SD increase in BMI-GRS was associated with a 0.1743 kg/m 2 lower BMI in exercisers than in non-exercisers (p = 0.0009, Table 3). Each 1 SD increase in BFP-GRS was associated with a 0.2661% lower BFP in exercisers than in non-exercisers (p = 0.0031). Each 1 SD increase in WC-GRS was associated with a 0.3854 cm lower WC in exercisers than in non-exercisers (p = 0.0052). Each 1 SD increase in HC-GRS was associated with a 0.3868 cm lower HC in exercisers than in non-exercisers (p = 0.0001). As shown by Figure 4, except WHR, GRS effect on each obesity measure was smaller in exercisers than in non-exercisers. For each obesity measure, γ Int in Table 3 is approximately equal to the length of blue bar -the length of red bar in Figure 4. In the whole study population, each 1 SD increase in BMI-GRS was associated with a 0.2420 kg/m 2 (the length of orange bar in Figure 4A) higher BMI (P = 1.1 × 10 −20 ). This association was stronger in non-exercisers than in exercisers (P Int = 0.0009, Table 3). In non-exercisers, each 1 SD increase in BMI-GRS was associated with a 0.3136 kg/m 2 (the length of red bar in Figure 4A) higher BMI (P = 3.4 × 10 −18 ); in exercisers, each 1 SD increase in BMI-GRS was associated with a 0.1382 kg/m 2 (the length of blue bar in Figure 4A) higher BMI (P = 1.4 × 10 −4 ).
Performing regular exercises is associated with an attenuation of the adverse influence of the FTO gene on both WC and HC, but not on WHR because it is a ratio of WC to HC. This negative result for WHR is in line with the result from polygenic analyses (i.e., aggregating information of multiple nearly independent SNPs across the genome) .

FGF5 × WHR Interaction on Blood Pressure Levels
In addition to binary exposures, we also provide an example for continuous exposures. The FGF5 gene is associated with blood pressure levels in Han Chinese (Liu et al., 2011;. FGF5 spans from 81, 187, 742 to 81, 212, 171 base pairs on chromosome 4, according to the human genome GRCh37/hg19 assembly. Similar to many gene-based tests (Liu et al., 2010), our analysis region also included 50 kb in the 3 and 5 regions that might regulate the gene. A total of 38 SNPs (all with MAF >1%) were included in this chromosomal region. It has been known that central obesity is a risk factor for hypertension (Parrinello et al., 1996). Therefore, we used the 6 G × E tests to investigate whether the influence of FGF5 can be modified by an indicator of central obesity, WHR. Covariates adjusted in all models included sex, age (in years), drinking status, smoking status, and the first 10 ancestry PCs. As described above, a total of 18,424 unrelated TWB subjects were included in our analysis, where 9,093 were males and 9,331 were females. Table 4 presents the results of six methods. All six tests suggested the significance of FGF5 × WHR interaction on DBP and SBP (P Int < 0.05). Positive γ Int s from GRS-based tests represent that a higher WHR exacerbates the adverse influence of FGF5.
The above two examples demonstrate using GRS-based methods to infer whether E attenuates or exacerbates the adverse influence of a candidate gene (FTO and FGF5, respectively).

DISCUSSION
Detecting G × E has been an important but challenging issue. Although set-based G × E tests have been evaluated in a genome-wide context, very few genes have been found to interact with some exposures at the genome-wide significance level (i.e., p < 0.05 20000 = 2.5 × 10 −6 . Because there are ∼20,000 genes across the genome, 2.5 × 10 −6 is the commonly used significance level in genome-wide gene-based analyses) (Chen et al., 2014;. Searching for evidence of G × E for candidate genes may still be a more feasible strategy. It is common to see a GRS that aggregates information of multiple nearly independent SNPs across the genome (Ahmad et al., 2013;Rask-Andersen et al., 2017;Lin et al., , 2020. However, there have been few applications of using GRS in genebased G × E tests. In this work, we evaluate the performance of GRS gene-based G × E approach, with simulations and real applications. GRS-based tests can outperform other methods if interactions tend to go in the same direction (Aschard, 2016). The filtering stage of SBERIA overlooks LD among SNPs because it considers L SNPs in L respective regressions (Eq. 8). RIDGE has the advantage of accommodating LD among SNPs and providing regression coefficients that lead to minimum MSE. Although RIDGE has been used to address LD in genetic association analysis (Malo et al., 2008), our study is the first attempt to use the ridge regression to construct a GRS for G × E analyses. ENET and LASSO can not only accommodate LD among SNPs, but they also select SNPs by shrinking some regression coefficients to 0.
Although ENET has been suggested as a powerful G × E test in pathway analyses , in this study, we showed that its performance to find traitassociated SNPs can be compromised if E attenuates the adverse effect of a gene (simulation scenarios 2, 4, 7, 9, 11, and 14). As a result, RIDGE is recommended if E attenuates the adverse effects of most SNPs in a gene, as shown in the application of FTO × exercise interaction on obesity measures (Table 3). In contrast, if E exacerbates the adverse effects of most SNPs in a gene, ENET and LASSO will be more powerful, as shown in the application of FGF5 × WHR interaction on SBP (Table 4).
In this work, we described how to use GRS approaches to infer whether E attenuates or exacerbates the adverse effect of a gene. GRS approaches can not only provide a p-value for G × E but also infer how E interacts with the gene. The signs of γ Int were usually correctly specified except scenarios 4 and 11, where only 2 SNP × E interactions were specified and their interaction effects were in the opposite direction to the SNP main effects. SBERIA usually makes more mistakes because its filtering stage overlooks LD among SNPs by considering L SNPs in L respective regressions (Eq. 8).
In the cross-over situations (simulation scenarios 5 and 12), where E exacerbates the adverse effects of some SNPs but attenuates the adverse effects of others, GRS approaches were underpowered and ADABF became the most useful. None of the six methods could outperform the others across all situations. In this work, we performed simulations to explore the relative performances of these six tests in detecting G × E and correctly indicating the interaction direction. To show the utility of GRS-based approaches, we provided an example for binary exposure and an example for continuous exposure: (1) FTO × exercise interaction on five obesity measures ( Table 3); (2) FGF5 × WHR interaction on blood pressure levels ( Table 4). The γ Int s in Table 3 were consistently negative, representing that performing regular exercise attenuates the adverse influence of the FTO gene on four obesity measures. Moreover, γ Int s in Table 4 were consistently positive, indicating that a higher WHR exacerbates the adverse effect of the FGF5 gene on blood pressure levels.
This work provides contributions to an important issue: identify whether E attenuates or exacerbates the adverse influence of a candidate gene. Genes exhibiting interactions with E are usually difficult to detect and replicate (McAllister et al., 2017), especially at the genome-wide significance level (2.5 × 10 −6 ). When testing for many genes simultaneously, even true positive G × Es would have difficulty in standing out among all the noise (Lin and Lee, 2010). Therefore, our discussions here are restricted to candidate genes. As TWB keeps recruiting more subjects, in the future we look forward to performing genome-wide G × E analyses on a larger TWB cohort.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Research Ethics Committee of the National Taiwan University Hospital (NTUH-REC No. 201805050RINB). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
W-YL developed the methods and the analysis tools, designed and performed the simulation studies, analyzed the TWB data, and wrote the manuscript. Y-SL contributed to the programming for the simulations and data analyses. C-CC, Y-LL, S-JT, and P-HK contributed to the writing of the manuscript. P-HK, Y-LL, and S-JT provided the TWB data. All authors reviewed the manuscript.