- 1Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
- 2Master of Public Health Program, College of Public Health, National Taiwan University, Taipei, Taiwan
Introduction: Detection of variance quantitative trait loci (vQTL) can facilitate the discovery of gene-environment (GxE) and gene-gene interactions (GxG). Identifying vQTLs before direct GxE and GxG analyses can considerably reduce the number of tests and the multiple-testing penalty.
Methods: Despite some methods proposed for vQTL detection, few studies have performed a head-to-head comparison simultaneously concerning false positive rates (FPRs), power, and computational time. This work compares three parametric and two non-parametric vQTL tests.
Results: Simulation studies show that the deviation regression model (DRM) and Kruskal-Wallis test (KW) are the most recommended parametric and non-parametric tests, respectively. The quantile integral linear model (QUAIL, non-parametric) appropriately preserves the FPR under normally or non-normally distributed traits. However, its power is never among the optimal choices, and its computational time is much longer than that of competitors. The Brown-Forsythe test (BF, parametric) can suffer from severe inflation in FPR when SNP’s minor allele frequencies <0.2. The double generalized linear model (DGLM, parametric) is not valid for non-normally distributed traits, although it is the most powerful method for normally distributed traits.
Discussion: Considering the robustness (to outliers) and computation time, I chose KW to analyze four lipid traits in the Taiwan Biobank. I further showed that GxE and GxG were enriched among 30 vQTLs identified from the four lipid traits.
1 Introduction
Omitting essential predictors from a regression model can lead to heteroscedasticity (Breusch and Pagan, 1979). In genetic analyses, loci with unequal phenotypic variances across different genotype groups are called “variance quantitative trait loci” (vQTLs), which can be caused by omitting gene-environment interaction (GxE) or gene-gene interaction (GxG) in the regression model (Pare et al., 2010; Ronnegard and Valdar, 2012; Shi, 2022; Wang et al., 2019). For example, Wang et al. performed a genome-wide vQTL analysis of 5.6 million variants on ∼350,000 unrelated individuals of European ancestry for 13 quantitative traits. They identified 75 significant vQTLs for nine traits, especially those related to obesity. Pervasive GxE effects for obesity-related traits were further explored through direct GxE analyses. Wang et al.‘s study has demonstrated the detection of GxE without environmental data (Wang et al., 2019).
Methods for detecting vQTLs are categorized as parametric or non-parametric. The development of parametric vQTL tests usually depends on the assumption of normally distributed traits (Marderstein et al., 2021; Young et al., 2018). Contrastingly, non-parametric vQTL tests are generally more robust to trait distributions (Miao et al., 2022).
For every single-nucleotide polymorphism (SNP), we can use the Brown-Forsythe (BF) test (Brown and Forsythe, 1974) to compute the dispersion within three genotype groups. Compared with the Levene’s test (Levene, 1960), the BF test is more robust to outliers by choosing the median to replace the mean as the center of each genotype group. Moreover, a “deviation regression model” (DRM) has been proposed to allow continuous predictors such as minor allele dosages (Marderstein et al., 2021). Therefore, the predictor is not limited to minor allele counts of SNPs (i.e., 0, 1, and 2). Furthermore, a double generalized linear model (DGLM) was also developed to identify loci associated with trait variability and to detect interactions in genome-wide association studies (GWAS) (Ronnegard and Valdar, 2011; Smyth, 1989). BF, DRM, and DGLM are all the so-called parametric vQTL methods.
Non-parametric methods for vQTL identification include the Kruskal-Wallis test (KW) (Kruskal and Wallis, 1952) and a quantile integral linear model (QUAIL) (Miao et al., 2022). As the non-parametric counterpart of one-way analysis of variance (ANOVA), KW is used to compare the median difference between multiple groups. As long as we first calculate the deviation between trait values and within-group medians, KW can also be applied to test phenotypic homoscedasticity among the three genotype groups. Another non-parametric method, QUAIL, assesses genetic effects on phenotypic variability based on the quantile regression framework. It allows covariate adjustment, non-normally distributed traits, and continuous predictors (i.e., not limited to three genotype categories) (Miao et al., 2022). However, to integrate information from K quantiles (usually K = 100), QUAIL’s computation time is much longer than that of its competitors (BF, DRM, DGLM, and KW). Implementing QUAIL for genome-wide vQTL analysis is computationally challenging.
This work evaluates the performance of the abovementioned parametric or non-parametric methods in detecting vQTLs through Monte Carlo simulations. I performed a head-to-head comparison to assess the false positive rates (FPRs), power, and computation time across several popular vQTL methods. Furthermore, I apply these approaches to four lipid traits in the Taiwan Biobank (TWB) data, including high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), triglyceride (TG), and total cholesterol (TCHO). After identifying vQTLs for these four traits, I perform direct GxE and GxG analyses to demonstrate enriched interaction effects among vQTLs.
2 Materials and methods
2.1 Covariate-adjusted residuals
Let
2.2 Non-parametric vQTL methods
Non-parametric vQTL methods are developed without the assumption of normally distributed traits. Therefore, they are generally more robust to trait distributions (Miao et al., 2022). Like many non-parametric statistical tests, they may be less powerful than parametric vQTL methods when the traits indeed follow normal distributions (Ronnegard and Valdar, 2012). However, they can have more valid results even when the traits are not normally distributed. In the following, I introduce two non-parametric methods, including the Kruskal-Wallis test (KW) and the quantile integral linear model (QUAIL).
2.2.1 Kruskal-Wallis test (KW)
KW is a non-parametric statistical test used to assess whether there is a significant difference between the medians of two or more independent groups. Because the KW test is not originally designed to detect differences in variances, I have to convert the data into a measure of “dispersion” before using KW. Let
where N is the total sample size,
2.2.2 Quantile integral linear model (QUAIL)
Although I have adjusted the covariates’ effects in Section 2.1, I still present the statistical model shown in QUAIL’s paper as follows (Miao et al., 2022). Through this model, people may get a complete understanding of Miao et al.’s methodology. Let the conditional quantile function (Q) of the trait Y at a SNP be Equation 2 as follows
where
2.3 Parametric vQTL methods
Parametric vQTL methods are usually developed with the assumption of normally distributed traits (Marderstein et al., 2021; Young et al., 2018). Therefore, their performances are generally more sensitive to trait distributions (Miao et al., 2022). However, if the trait really follows the normal distribution, parametric tests can be more powerful than their non-parametric counterparts. In the following, I describe three parametric vQTL methods, including the deviation regression model (DRM), Brown-Forsythe test (BF), and the double generalized linear model (DGLM).
2.3.1 Deviation regression model (DRM)
To search for SNPs that are associated with phenotypic variability, Marderstein et al. regressed
2.3.2 Brown-Forsythe test (BF)
The BF test statistic is listed in Equation 4 as follows,
where
To implement the BF test, I used the R function “leveneTest” while specifying “center = median” in the “car” package (version 3.1–3) (Fox and Weisberg, 2018). The Levene’s and the BF tests are both used to test the homoscedasticity across different groups. The BF test is more robust to outliers than Levene’s test because it uses deviations from the group median, while Levene’s test uses deviations from the group mean.
2.3.3 Double generalized linear model (DGLM)
DGLM fits two generalized linear models (GLM) that model genetic effects on the mean and dispersion of the covariate-adjusted residuals (Zhang and Bell, 2024). It regresses the covariate-adjusted residual
2.4 Data from the Taiwan Biobank (TWB)
As of February 2024, 147,836 individuals aged 30 to 70 years have been genotyped whole-genome. The TWB performed genotype imputation with the IMPUTE2 software (v2.3.1) (Delaneau et al., 2013; Howie et al., 2009). To improve the imputation accuracy, the TWB combined 1,445 TWB individuals with whole-genome sequence data and 504 East Asians (EAS) from the 1,000 Genomes Phase 3 v5 as the reference panel (Wei et al., 2021). After completing the imputation, the TWB excluded variants with missing rates >5%, minor allele frequencies (MAFs) < 0.01%, or imputation information scores <0.3, where 0.3 was usually adopted as the acceptable threshold for imputation quality (Kosugi et al., 2023). After this quality control filtering, we had 9,814,944 autosomal variants for analysis.
2.5 Simulation studies
I performed simulations to evaluate the type I error rates and power of the five abovementioned methods. I here focused on the simulations of detecting SNP1-by-SNP2 interactions. Nonetheless, the results can be generalized to GxE identification. I randomly selected four SNPs as SNP1, including rs34625133 on chromosome (chr.) 1, rs7870809 on chr. 9, rs7982209 on chr. 13, and rs140100 on chr. 22. The MAFs of these four SNPs were 0.1, 0.2, 0.3, and 0.4, respectively. They were, in turn (one by one), treated as SNP1. I aimed to evaluate the power of detecting SNP1-by-SNP2 interactions under various MAFs of SNP1.
I used 20,000 common SNPs (MAFs
where
By specifying
Although Equation (5) is originally designed to test for GxG interactions, the results can be generalized to GxE identification. Without loss of generality, I can replace
3 Results
3.1 Simulation studies
3.1.1 False positive rates
Figure 1 presents the QQ plots (the left column) and power under α = 5E-8 (columns 2–5) for five vQTL tests (N = 30,000; with SNP main effects). Supplementary Figure S1 demonstrates the results given N = 30,000, and SNP main effects do not exist. Supplementary Figure S2 (with SNP main effects) and S3 (without SNP main effects) show the results when N = 147,836. The rank-based inverse-normal transformation (INT) can convert data to a normal distribution. However, previous vQTL studies found that this INT transformation led to inflation in false positive rates (FPR) (Marderstein et al., 2021; Miao et al., 2022; Wang et al., 2019). The FPR was calculated by dividing the incorrectly classified negatives by the total negatives. To evaluate INT’s performance in our simulation setting, we equipped it with DGLM.

Figure 1. QQ plots (the left column) and power (
3.1.1.1 Parametric tests
The results show that DGLM has inflated FPR when the phenotypes are not normally distributed (Figure 1; Supplementary Figures S1–S3F, K). Taking the INT transformation helps to adjust FPR for kurtotic phenotypes (Figure 1; Supplementary Figures S1–S3F) but not for skewed phenotypes (Figure 1; Supplementary Figures S1–S3K).
The BF test has inflated FPR given kurtotic phenotypes (Figure 1; Supplementary Figures S1–S3F), and the situation is getting worse for smaller sample sizes (Figure 1; Supplementary Figure S1F). Supplementary Figures S4–S15 demonstrate the QQ plots stratified by the MAF range. For the BF test, the inflation in FPR is especially severe for kurtotic phenotypes when N = 30,000 and MAF <0.2 (Supplementary Figures S5, S8, (A) (B) (C)). Given N = 30,000 and a small MAF, a genotype group may only contain a few observations, and false positives may come with the issue of data sparsity (Lin, 2024).
DRM is similar to BF, except that DRM treats genotypes as a continuous scale (0, 1, or 2). DRM does not categorize the three genotypes into three groups. Therefore, the sparsity within genotype groups is not a critical problem for DRM, and the inflation in FPR is not critical for DRM compared with BF (Supplementary Figures S5, S8A–C).
3.1.1.2 Non-parametric tests
QUAIL preserves an appropriate FPR across three trait distributions. KW maintains a suitable FPR under the kurtotic distribution (Figure 1; Supplementary Figure S1–S3F) but has a slight inflation in FPR given the skewed distribution (Figure 1; Supplementary Figure S1–S3K). Under a kurtotic distribution, more extreme values can occur compared to the normal distribution. Nonetheless, by transforming the extreme values into ranks, KW can still preserve an appropriate FPR. By contrast, previous research has found that KW tends to have inflated FPR for heteroscedastic cases (McDonald, 2014). Under a skewed distribution, unequal variances in ranks across the three genotype groups may still exist even if there is no GxG or GxE. KW’s strategy to transform values into ranks cannot entirely address the heteroscedastic problem.
3.1.2 Statistical power
The statistical power was calculated by dividing the correctly classified positives by the total positives (each scenario was simulated 20,000 times). When the phenotype is normally distributed, DGLM and DGLM_INT were the most powerful methods (Figure 1; Supplementary Figure S1–S3C–E). Among the two non-parametric tests, QUAIL was superior to KW for normally distributed phenotypes (Figure 1; Supplementary Figure S1–S3C–E). When the phenotype is a kurtotic distribution, DGLM and BF had inflation in FPR (Figure 1; Supplementary Figure S1–S3F). KW became the most outstanding test among the four valid methods that preserve a suitable FPR (Figure 1; Supplementary Figure S1–S3F–J). When the phenotype is skewed, KW had a slight inflation in FPR, whereas DGLM and DGLM_INT had a severe inflation in FPR (Figure 1; Supplementary Figure S1–S3K). The other three valid methods (QUAIL, DRM, and BF) had similar power (Figure 1; Supplementary Figure S1–S3L–O).
By comparing the results of different simulation scenarios, I concluded three points: (1) When the MAF for SNP1 or SNP2 was larger, the percentage of the variance explained by the interaction effect increased, and the power of each method was enhanced. (2) When the sample size was enlarged from N = 30,000 to N = 147,836, the power performances of all methods were greatly improved. (3) Lastly, by comparing Figure 1 with Supplementary Figure S1, S2 with Supplementary Figure S3, the power of each method was higher in the presence of SNP main effects than in the absence of SNP main effects.
3.1.3 Computational time
Table 1 shows the computation time (in seconds) for each simulation replicate. I measured the execution time in R (version 4.3.1) on a Linux system running at 3.6 GHz and 32 GB of RAM. Although QUAIL is a valid test with appropriate FPR under three trait distributions, it takes much more computation time than other methods. The other non-parametric test, KW, requires only 1/70–1/90 of QUAIL’s execution time. The computation time of the four parametric tests is also reasonable. The time of computation needed for these tests is ordered as DRM < BF < KW < DGLM

Table 1. Computation time (in seconds) for each simulation replicate The execution time was measured in R (version 4.3.1) on a Linux system running at 3.6 GHz and 32 GB of RAM.
3.2 Real data analysis
3.2.1 30 vQTLs of four lipid traits
When conducting actual genome-wide data analysis, investigators had to exclude cryptic relatedness among individuals (Marees et al., 2018). The TWB investigated cryptic relatedness among participants with the KING (Kinship-based INference for GWAS) software (Manichaikul et al., 2010). I removed the person with higher missing genotype rates for each first- or second-degree relative pair. This step excluded 28,928 from the 147,836 TWB individuals, and 118,908 remained in the vQTL analysis.
I analyzed four lipid traits, including high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), triglyceride (TG), and total cholesterol (TCHO). Supplementary Figure S16 shows the histograms of the four lipid traits. All four traits are right-skewed with positive skewness values, especially TG (skewness = 2.16). I first removed SNP’s main effect and covariates’ effects from each phenotype. Specifically, I extracted the residuals of regressing each trait on
Sex, age, and BMI significantly influence people’s lipid profiles (Beyene et al., 2020). Smoking impairs lipid metabolism by decreasing LDL receptor expression (Ma et al., 2020). Alcohol consumption disturbs lipid metabolism by increasing adipose tissue lipolysis and causing ectopic fat deposition in the liver (Steiner and Lang, 2017). Regular physical activity has been found to improve lipoprotein-lipid profiles (Lin, 2021). Moreover, lower educational attainment is usually linked to worse lipid profiles (Lara and Amigo, 2018). Because these seven covariates are associated with lipid traits, I adjusted them in all vQTL methods. Furthermore, the first 10 ancestry PCs are adjusted in genetic analyses to avoid population stratification (Uffelmann et al., 2021).
Based on the simulation results, I chose the KW test as the primary vQTL method. This non-parametric approach is robust to outliers by transforming trait values into ranks (Figure 1; Supplementary Figure S1–S3F). Moreover, although KW has a slight inflation in FPR for skewed trait distributions (Figure 1; Supplementary Figure S1–S3K), investigators can perform a follow-up regression analysis to check whether GxE and GxG exist (i.e., the so-called “direct GxE or GxG analysis”). Furthermore, KW needs only 1/70–1/90 of the other non-parametric competitor’s (QUAIL) execution time. Because SNPs with MAFs <0.05 are difficult to replicate in GxG or GxE findings (Lin, 2024; Wang et al., 2019), I only analyzed 2,580,790 common SNPs with MAFs
With the PLINK clumping procedure (Purcell et al., 2007), I detected 10 independent HDL-vQTLs (KW p-value < 5E-8) with linkage disequilibrium (LD) measure r2 < 0.01. Moreover, 6, 12, and 6 independent vQTLs were identified from LDL, TG, and TCHO, respectively. Four SNPs were found to be vQTLs of multiple lipid traits, including rs4704208 (near HMGCR), rs11748027 (in ANKDD1B), rs483082 (in APOC1), and rs662799 (in APOA5). Table 3 summarizes the 30 (=10 + 6+12+6–4) unique vQTLs of the four lipid traits.

Table 3. Thirty vQTLs (KW p-value < 5E-8 and linkage disequilibrium measure r2 < 0.01) of the four lipid traits.
3.2.2 Direct GxE analysis
From the regression (or variable selection) perspective, it is important to keep the hierarchical structure between main and interaction effects (Zhou et al., 2021). Following previous vQTL research (Wang et al., 2019), after identifying vQTLs, I performed a “direct GxE analysis” by regressing the trait on the minor allele count of each vQTL (0, 1, or 2), seven environmental factors (Es), and the product (interaction) term between the minor allele count and one of the seven Es. To remove population stratification, I also adjusted the top 10 ancestry PCs in this regression. The seven Es are listed as the horizontal axis of Figure 2, including SEX (female vs male), SPO (performing regular exercise, yes vs no), EDU (educational attainment, an integer from 1 to 7), AGE (chronological age, in years), BMI (in kg/m2), DRK (alcohol consumption, yes vs no), and SMK (cigarette smoking status, yes vs no). As Westerman et al. (Westerman et al., 2022) indicated, some vQTLs were “pleiotropic” concerning phenotypic variability. Table 2 also shows that four loci are vQTLs shared by multiple lipid traits. Instead of performing the direct GxE and GxG analysis for respective vQTLs of each lipid trait (i.e., 10 vQTLs for HDL, 6 vQTLs for LDL, 12 vQTLs for TG, and 6 vQTLs for TCHO), I analyzed all 30 vQTLs to have a more comprehensive picture of the four lipid traits.

Figure 2. The phylogenetic heat map of the gene-environment interaction analysis for triglyceride (TG). The magnitude of the value represents–log10 (two-sided p-value of the SNP-E interaction), which is always positive. However, I deliberately added a positive/negative sign before the magnitude. A positive sign indicates that the environmental factor (E) exacerbates the vQTLs’ effects. In contrast, a negative sign suggests that the E attenuates the vQTLs’ effects. The x-axis lists the 7 Es, including SEX (female vs. male), SPO (performing regular exercise, yes vs no), EDU (educational attainment, an integer from 1 to 7), AGE (chronological age, in years), BMI (in kg/m2), DRK (alcohol consumption, yes vs no), and SMK (cigarette smoking status, yes vs no).
Because TG had more vQTLs compared with the other three traits, I present the phylogenetic heat map of the direct GxE analysis for TG in Figure 2. The x-axis of Figure 2 lists the 7 Es. The magnitude of the value in Figure 2 represents–log10 (two-sided p-value of the SNP-E interaction), which is always positive. However, I deliberately added a positive/negative sign before the magnitude. A positive sign indicates that E exacerbates the vQTLs’ effects. In contrast, a negative sign suggests that the E attenuates the vQTLs’ effects. Females have more attenuated TG genetic effects than males (blue color in SEX column), whereas higher BMI, alcohol consumption, and cigarette smoking lead to more substantial TG genetic effects (red color in BMI, DRK, and SMK columns). Moreover, the phylogenetic heat maps for the remaining three traits are demonstrated in Supplementary Figures S17–S19.
3.2.3 Direct GxG analysis
I further performed a “direct GxG analysis” by regressing the trait on the minor allele counts (0, 1, or 2) of two SNPs, the seven Es mentioned above, and the product (interaction) term of the minor allele counts of the two SNPs. Similarly, I adjusted the top 10 ancestry PCs in this regression. Figure 3 shows the phylogenetic heat map of the direct GxG analysis for TG. The maps for the other three lipid traits can be found in Supplementary Figures S20–S22.

Figure 3. The phylogenetic heat map of the gene-gene interaction analysis for triglyceride (TG). The magnitude of the value represents–log10 (two-sided p-value of the SNP-SNP interaction), which is always positive. The most significant SNP-SNP interaction is between rs438811 (chr. 19, in APOC1) and rs662799 (chr. 11, in APOA5), where p = 1.3E-15. SNP rs483082 is highly correlated with rs438811 (r2 = 0.99), and rs3741297 is in linkage disequilibrium with rs662799 (r2 = 0.19). Therefore, despite four black cells indicating four significant SNP-SNP interactions, I only highlighted rs438811-rs662799 interaction.
Regarding TG, the interaction between rs438811 (chr. 19, in APOC1) and rs662799 (chr. 11, in APOA5) is the most significant among the
4 Discussion
Searching for vQTLs greatly facilitates the mining of GxE and GxG. For a GWAS incorporating one million SNPs and seven Es, seven million GxE tests and
This work evaluates the performance of three parametric (DRM, BF, and DGLM) and two non-parametric (KW and QUAIL) methods in detecting vQTLs. Although DGLM is the most powerful method for normally distributed traits (Figure 1; Supplementary Figure S1–S3C–E), it has severe inflation in FPR for traits following other distributions (Figure 1; Supplementary Figure S1–S3F, K). The random error term of DGLM is assumed to follow the normal distribution. This is why it fails to control for FPR when the trait is non-normally distributed. Therefore, DGLM should not be adopted for vQTL detection when the trait distribution is unclear. Taking INT transformation helps adjust FPR for kurtotic traits (Figure 1; Supplementary Figure S1–S3F) but not for skewed traits (Figure 1; Supplementary Figure S1–S3K). Transforming trait values into ranks like INT and KW can address the outlier problem in kurtotic distributions. However, these two rank-based methods, especially DGLM_INT, cannot fully solve the heteroscedastic problem under skewed traits (Figure 1; Supplementary Figure S1–S3K).
BF has inflated FPR under kurtotic traits when MAF <0.2 (Supplementary Figures S5, S8A–C). The BF test is equivalent to the ANOVA F test for regressing
DRM has a power performance similar to that of BF. These two tests are based on a similar strategy. DRM regresses
Two non-parametric tests, QUAIL and KW, were evaluated in this work. Although QUAIL maintains an appropriate FPR under normally or non-normally distributed traits (column 1 of Figure 1; Supplementary Figure S1–S3), its statistical power is never among the optimal choices under any situation (columns 2–5 of Figure 1; Supplementary Figure S1–S3). Besides, the power of QUAIL can be compromised by quantile crossing, which is a well-known issue when estimating multiple quantiles simultaneously (Bondell et al., 2010). However, the QUAIL R code does not evaluate whether an analysis suffers from quantile crossing. Moreover, QUAIL requires much longer computation time than the other competitors. Therefore, I would not recommend using QUAIL to perform a genome-wide vQTL search. On the other hand, KW is the most powerful vQTL test for kurtotic phenotypes (Figure 1; Supplementary Figure S1–S3G–J). When the traits are skewed, KW presents a slight inflation in FPR (Figure 1; Supplementary Figure S1–S3K). However, the subsequent direct GxE or GxG analysis can further assess the significance of GxE and GxG. Therefore, after considering the computation time, I chose KW to analyze the four lipid traits in the TWB data.
With the comprehensive simulations in this work, DRM and KW are the most recommended parametric and non-parametric vQTL tests, respectively. Unlike DRM (Marderstein et al., 2021), KW has no specific paper to introduce its implementation in detecting vQTLs. Therefore, as a demonstration, I here used KW to analyze the four TWB lipid traits. Other investigators may also choose DRM to perform genome-wide vQTL searches because of its adequate control of FPR and shorter computational time (Table 1). If one applies two or more methods to the same data set, he/she may explore the superiority of each method by comparing the replication rates of different techniques. That is, one can first separate the data set into a discovery set and a replication set, then calculate the replication rates of different methods.
Same with previous vQTL works (Lin, 2024; Wang et al., 2019), I only analyzed common SNPs with MAFs
For TG, APOA5 rs662799 has been found to interact with cigarette smoking and alcohol consumption, based on individuals from Korea (Park and Kang, 2020). These interactions were also associated with metabolic syndrome (MetS) by analyzing subjects from Jilin Province of China (Wu et al., 2016). Because TG is one critical component of MetS, both studies (Park and Kang, 2020; Wu et al., 2016) were in line with our analysis results (APOA5 rs662799 is on the second row of Figure 2).
DRM and KW are the most recommended parametric and non-parametric vQTL methods. QUAIL appropriately preserves the FPR under normally or non-normally distributed traits. However, its power is never among the optimal choices, and its computational time is much longer than the other competitors. BF suffers from severe inflation in FPR when SNP’s MAF <0.2. We may alleviate this inflation in FPR by increasing the sample size to, say, N ∼ 150,000. DRM and BF are based on a similar strategy. DRM regresses the deviation between individuals’ covariate-adjusted residuals and their group medians on the minor allele count of an SNP, whereas BF regresses the deviation between individuals’ covariate-adjusted residuals and their group medians on two indicator variables categorizing three genotypes. Therefore, DRM and BF have similar power. However, BF suffers from more inflation in FPR than DRM does, mainly when a small sample size is observed in one genotype. Although DGLM is the most powerful method given normally distributed traits, it is not valid (i.e., producing large FPR) for non-normally distributed traits. Adopting the rank-based INT transformation can address the outlier problem and adjust the FPR for kurtotic traits. However, INT cannot fully solve the heteroscedastic issue of skewed traits.
Recently, a robust Bayesian mixed model rooted in the one-way ANOVA has been developed (Fan et al., 2025). By specifying a likelihood based on a heavy-tailed distribution, this method can improve robustness in GxE detection. With Fan et al.‘s concept (Fan et al., 2025), the non-robust parametric tests can build a robust mixed-effect model.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.twbiobank.org.tw/new_web/, TWBR10810-07.
Ethics statement
The studies involving humans were approved by National Taiwan University Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study. No potentially identifiable images or data are presented in this study.
Author contributions
W-YL: Formal Analysis, Methodology, Validation, Project administration, Data curation, Supervision, Writing – original draft, Conceptualization, Software, Funding acquisition, Investigation, Visualization, Resources, Writing – review and editing.
Funding
The author(s) declare that financial support was received for the research and/or publication of this article. This study was supported by the National Science and Technology Council of Taiwan (grant number 112-2628-B-002-024-MY3 to W.-Y.L.).
Acknowledgments
The authors thank the reviewers for their constructive comments and all the Taiwan Biobank investigators, staff, and participants for their contributions. This research has been conducted using the Taiwan Biobank Resource under application number TWBR10810-07.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2025.1617504/full#supplementary-material
References
Beyene, H. B., Olshansky, G., Aa, T. S., Giles, C., Huynh, K., Cinel, M., et al. (2020). High-coverage plasma lipidomics reveals novel sex-specific lipidomic fingerprints of age and BMI: evidence from two large population cohort studies. PLoS Biol. Sep. 18, e3000870. doi:10.1371/journal.pbio.3000870
Bondell, H. D., Reich, B. J., and Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika 97, 825–838. doi:10.1093/biomet/asq048
Breusch, T. S., and Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica 47, 1287–1294. doi:10.2307/1911963
Brown, M. B., and Forsythe, A. B. (1974). The small sample behavior of some statistics which test the equality of several means. Technometrics 16, 129–132. doi:10.1080/00401706.1974.10489158
Delaneau, O., Zagury, J. F., and Marchini, J. (2013). Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6. doi:10.1038/nmeth.2307
Fan, K., Jiang, Y., Ma, S. G., Wang, W. Q., and Wu, C. (2025). Robust sparse Bayesian regression for longitudinal gene-environment interactions. J. Roy. Stat. Soc. C, qlaf027. doi:10.1093/jrsssc/qlaf027
Fox, J., and Weisberg, S. (2018). An R companion to applied regression. Thousand Oaks, United States: Sage Publications. Third Edition.
Howie, B. N., Donnelly, P., and Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. Plos Genet. Jun 5, e1000529. doi:10.1371/journal.pgen.1000529
Kosugi, S., Kamatani, Y., Harada, K., Tomizuka, K., Momozawa, Y., Morisaki, T., et al. (2023). Detection of trait-associated structural variations using short-read sequencing. Cell. Genom 14 (3), 100328. doi:10.1016/j.xgen.2023.100328
Kozlowski, J., and Gawelczyk, A. T. (2002). Why are species' body size distributions usually skewed to the right?. Funct. Ecol. 16., 419–432. doi:10.1046/j.1365-2435.2002.00646.x
Kruskal, W. H., and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621. doi:10.2307/2280779
Lara, M., and Amigo, H. (2018). Association between education and blood lipid levels as income increases over a decade: a cohort study. BMC Public Health 18, 256. doi:10.1186/s12889-018-5185-3
Levene, H. (1960). Robust tests for equality of variances. In: Contributions to probability and statistics; essays in honor of harold hotelling. Redwood City, United States: Stanford University Press. p. 278–292.
Lin, W. Y. (2021). A large-scale observational study linking various kinds of physical exercise to lipoprotein-lipid profile. J. Int. Soc. Sport Nutr. 18, 35. doi:10.1186/s12970-021-00436-2
Lin, W. Y. (2024). Detecting gene-environment interactions from multiple continuous traits. Bioinformatics 40, btae419. doi:10.1093/bioinformatics/btae419
Ma, B., Chen, Y., Wang, X., Zhang, R., Niu, S., Ni, L., et al. (2020). Cigarette smoke exposure impairs lipid metabolism by decreasing low-density lipoprotein receptor expression in hepatocytes. Lipids Health Dis. 19, 88. doi:10.1186/s12944-020-01276-w
Manichaikul, A., Mychaleckyj, J. C., Rich, S. S., Daly, K., Sale, M., and Chen, W. M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873. doi:10.1093/bioinformatics/btq559
Marderstein, A. R., Davenport, E. R., Kulm, S., Van Hout, C. V., Elemento, O., and Clark, A. G. (2021). Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am. J. Hum. Genet. 108, 49–67. doi:10.1016/j.ajhg.2020.11.016
Marees, A. T., de Kluiver, H., Stringer, S., Vorspan, F., Curis, E., Marie-Claire, C., et al. (2018). A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int. J. Methods Psychiatr. Res. 27, e1608. doi:10.1002/mpr.1608
McDonald, J. H. (2014). Handbook of biological statistics. Baltimore, Maryland: Sparky House Publishing. 3rd Edition.
Miao, J., Lin, Y., Wu, Y., Zheng, B., Schmitz, L. L., Fletcher, J. M., et al. (2022). A quantile integral linear model to quantify genetic effects on phenotypic variability. Proc. Natl. Acad. Sci. U. S. A. 119, e2212959119. doi:10.1073/pnas.2212959119
Pare, G., Cook, N. R., Ridker, P. M., and Chasman, D. I. (2010). On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the women's genome health Study. Plos Genet. Jun 6, e1000981. doi:10.1371/journal.pgen.1000981
Park, S., and Kang, S. (2020). Alcohol, carbohydrate, and calcium intakes and smoking interactions with APOA5 rs662799 and rs2266788 were associated with elevated plasma triglyceride concentrations in a cross-sectional study of Korean adults. J. Acad. Nutr. Diet. 120, 1318–1329. doi:10.1016/j.jand.2020.01.009
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. doi:10.1086/519795
Ronnegard, L., and Valdar, W. (2011). Detecting major genetic loci controlling phenotypic variability in experimental crosses. Genet. Jun 188, 435–447. doi:10.1534/genetics.111.127068
Ronnegard, L., and Valdar, W. (2012). Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability. Bmc Genet. 13, 63. doi:10.1186/1471-2156-13-63
Shi, G. (2022). Genome-wide variance quantitative trait locus analysis suggests small interaction effects in blood pressure traits. Sci. Rep. 12, 12649. doi:10.1038/s41598-022-16908-7
Smyth, G. K. (1989). Generalized linear-models with varying dispersion. J. Roy. Stat. Soc. B 51, 47–60. doi:10.1111/j.2517-6161.1989.tb01747.x
Steiner, J. L., and Lang, C. H. (2017). Alcohol, adipose tissue and lipid dysregulation. Biomol. Mar. 7, 16. doi:10.3390/biom7010016
Tharu, B. P., and Tsokos, C. P. (2017). A statistical study of serum cholesterol level by gender and race. J. Res. Health Sci. Jul 25 (17), e00386.
Uffelmann, E., Huang, Q. Q., Munung, N. S., de Vries, J., Okada, Y., Martin, A. R., et al. (2021). Genome-wide association studies. Nat. Rev. Method Prime 1, 59–21. doi:10.1038/s43586-021-00056-9
Wang, H. W., Zhang, F. T., Zeng, J., Wu, Y., Kemper, K. E., Xue, A. L., et al. (2019). Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci. Adv. 5, eaaw3538. doi:10.1126/sciadv.aaw3538
Wei, C. Y., Yang, J. H., Yeh, E. C., Tsai, M. F., Kao, H. J., Lo, C. Z., et al. (2021). Genetic profiles of 103,106 individuals in the Taiwan biobank provide insights into the health and history of Han Chinese. Npj Genom Med. 6, 10. doi:10.1038/s41525-021-00178-9
Westerman, K. E., Majarian, T. D., Giulianini, F., Jang, D. K., Miao, J., Florez, J. C., et al. (2022). Variance-quantitative trait loci enable systematic discovery of gene-environment interactions for cardiometabolic serum biomarkers. Nat. Commun. 13, 3993. doi:10.1038/s41467-022-31625-5
Wu, Y. H., Yu, Y. Q., Zhao, T. C., Wang, S. B., Fu, Y. L., Qi, Y., et al. (2016). Interactions of environmental factors and APOA1-APOC3-APOA4-APOA5 gene cluster gene polymorphisms with metabolic syndrome. Plos One 11, e0147946. doi:10.1371/journal.pone.0147946
Young, A. I., Wauthier, F. L., and Donnelly, P. (2018). Identifying loci affecting trait variability and detecting interactions in genome-wide association studies. Nat. Genet. Nov. 50, 1608–1614. doi:10.1038/s41588-018-0225-6
Zhang, X., and Bell, J. T. (2024). Detecting genetic effects on phenotype variability to capture gene-by-environment interactions: a systematic method comparison. G3 (Bethesda) 14, jkae022. doi:10.1093/g3journal/jkae022
Keywords: cholesterol, dyslipidemia, triglyceride, variance quantitative trait locus, Taiwan Biobank
Citation: Lin W-Y (2025) Mining for gene-environment and gene-gene interactions: parametric and non-parametric tests for detecting variance quantitative trait loci. Front. Genet. 16:1617504. doi: 10.3389/fgene.2025.1617504
Received: 24 April 2025; Accepted: 04 August 2025;
Published: 01 September 2025.
Edited by:
Ruzong Fan, Georgetown University Medical Center, United StatesReviewed by:
Cen Wu, Kansas State University, United StatesChi Zhang, Yale University, United States
Copyright © 2025 Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Wan-Yu Lin, bGlud3lAbnR1LmVkdS50dw==
†ORCID: Wan-Yu Lin, orcid.org/0000-0002-3385-4702