Detecting Weak Signals by Combining Small P-Values in Genetic Association Studies

We approach the problem of combining top-ranking association statistics or P-values from a new perspective which leads to a remarkably simple and powerful method. Statistical methods, such as the rank truncated product (RTP), have been developed for combining top-ranking associations, and this general strategy proved to be useful in applications for detecting combined effects of multiple disease components. To increase power, these methods aggregate signals across top ranking single nucleotide polymorphisms (SNPs), while adjusting for their total number assessed in a study. Analytic expressions for combined top statistics or P-values tend to be unwieldy, which complicates interpretation and practical implementation and hinders further developments. Here, we propose the augmented rank truncation (ART) method that retains main characteristics of the RTP but is substantially simpler to implement. ART leads to an efficient form of the adaptive algorithm, an approach where the number of top ranking SNPs is varied to optimize power. We illustrate our methods by strengthening previously reported associations of μ-opioid receptor variants with sensitivity to pain.

The k smallest unordered P-values are equicorrelated and also have the same marginal distribution, which can be obtained as a permutation distribution of the first k uniform order statistics. Assuming independence of L P-values and their uniform distribution under the null hypothesis, we can derive the correlation between any pair of unordered k smallest P-values as ρ(k, L) = 3(L − k)/(2 + k(L − 2) + 5L). As L increases, the correlation approaches the limit that no longer depends on L: lim L→∞ ρ(k, L) = 3/(k + 5). The correlation can be substantial for small k and cannot be ignored. There is a very simple transformation that makes a set of k P-values uncorrelated. All that is needed to decorrelate these P-values is to scale the largest of them: . . .
where σ = 2L − k + 3 + (k + 1)(L + 1)(L − k + 1) 4 + 2L , and then randomly shuffle the set X 1 , . . . X k . This scale factor σ can be derived by solving the mixture covariance linear equations induced by the permutation distribution of the first k order statistics. The decorrelated values can be further transformed so that each has the uniform (0,1) distribution marginally: Beta(X j ; i, L − i + 1), j = 1, . . . , k − 1 (S-1) Beta(X k /σ; i, L − i + 1), where Beta(x; a, b) is the CDF of a beta(a, b) distribution evaluated at x. Although the scaling and subsequent shuffle removes the correlation, the values remain dependent, as illustrated in Figure 1.

S-2 DERIVATION OF THE RTP DISTRIBUTION
An intuitive way to understand our derivation of the RTP distribution is through references to simulations. The simplest, brute-force algorithm to obtain the RTP combined P-value is by simulating its distribution directly. If w k is the product of k actual P-values, one can repeatedly (B times) simulate L Uniform(0,1) random variables U i , sort them, take the product of k smallest values, and compare the resulting product to w k . As the number of simulations, B, increases, the proportion of times that simulated values will be smaller than w k converges to the true combined RTP P-value.
There are several ways to optimize the above simulation scenario with respect to computational complexity. For instance, sets of ordered uniform P-values can be simulated directly using well-known results from the theory of order statistics. Despite the fact that the marginal distribution of ith ordered value is Beta(i, L − i + 1), to create the necessary dependency between the ordered P-values, sets of k values have to be simulated in a step-wise, conditional fashion. The minimum value, P (1) , can be sampled from Beta(1, L) distribution. Alternatively, using the relationship between beta and Uniform(0,1) random variables, it can be sampled as P (1) = 1 − U 1/L 1 . Next, since the value P (2) cannot be smaller than P (1) = p (1) , conditionally on the obtained value, it has to be generated from a truncated beta distribution. The third smallest value should be sampled conditionally on the second one, and so on (Balakrishnan and Rao (1998)). Therefore, the sequence and the product w k can be obtained by simulating k ordered P-values, rather then all L unsorted values.
Further optimization of the simulation algorithm is illustrative because it provides intuition for theoretical derivation of the RTP distribution. This optimization is achieved by using the Markov property of order statistics. Specifically, the unordered set {P 1 , P 2 , . . . , P k | P (k+1) = p (k+1) } behaves as a sample of k independent variables, identically distributed as Uniform 0, p (k+1) . This is a usual step in analytic derivations of product truncated distributions, and it follows by averaging over the density of P (k+1) . This approach was employed earlier by Dudbridge and Koeleman (2003) and our derivation of an alternative form of the RTP distributions follows this conditioning idea. After re-scaling, The capital P i notation is used here to emphasize the fact that the variable is random, while the lowercase p (k+1) refers to a realized value of a random variable, P (k+1) = p (k+1) . Next, given that P (k+1) ∼ Beta(k + 1, L − k), minus log of the product of independent conditional uniform random variables will follow a gamma distribution. Specifically, The above manipulations reduce the set of k random variables to a set of just two variables: a gamma and a beta. Therefore, the combined RTP P-value can be evaluated numerically by simulating only pairs of betaand gamma-distributed random variables as follows. We note that and define The empirical distribution of the product of k values under H 0 can then be obtained by repeatedly simulating X and Y , and comparing the observed value of − ln(w k ) to Z = Y − k ln(X) in every simulation. P RTP would then be defined as the proportion of times simulated values of Z were larger than − ln(w k ). Surprisingly, one can simultaneously evaluate probabilities for two consequtive partial products, by reusing the same pair of random numbers, which follows from the fact that In the latter case, − ln(w) is compared to Z = Y − (k + 1) ln(X). This simulation method is very fast and approaches the exact solution as the number of simulated pairs increases. Moreover, through these simulation experiments it becomes clear that once one conditions on the observed value of p (k+1) , the test statistic is formed as a product/sum of independent random variables. Specifically, Gamma distribution for the Y variable in Eq. (S-7) appears to be conditional on the observed X = p (k+1) when the pairs (X, Y ) are simulated. Alternatively, one can first simulate X = p (k+1) and then generate a test statistic using k uniform random variables, U 1 , U 2 , . . . , U k , on (0, p (k+1) ) interval.
We just described a way to evaluate the RTP distribution by repeated sampling of two random variables to elucidate the idea that the combined RTP P-value can be obtained by integrating out the random upper bound P (k+1) over its probability density function. Random P (k+1) has to be at least as large as p (k) but smaller than one, p (k) ≤ P (k+1) ≤ 1. After re-expressing p (k) in terms of the observed product w = k i=1 p (i) , it becomes evident that w 1/k ≤ P (k+1) ≤ 1 because the product is maximized if p (i) = p (k) for all i = 1, . . . , k, so the observed p (k) can be at most w 1/k . Now, integrating over the Beta density, f (·) with parameters k + 1, L − k, of a single variable P (k+1) , we will treat w as a constant: Next, following a transformation, we can express the integral as an expectation and make the integration limits to be 0 to 1, and thus, independent of k: is CDF of Gamma(k, 1). P RTP (k) is the combined RTP P-value. Also note that two partial products can be evaluated at the same time, We have now derived simple expressions that involve only a single integral where the integration limits (Eq. (S-10)) no longer involve a product value w and are conveniently bounded within zero to one interval. Eq. (S-10) illustrates that the RTP distribution can be viewed as the expectation of a function of a uniform random variable, U ∼Uniform(0,1).

S-4 DERIVATION OF THE ART-A DISTRIBUTION
As we discussed, ordered P-values can be represented as functions of the same number of independent uniform random variables (Eq. S-3). This reveals that the jth value, p (j) , is a function of all p (i<j) and that in a given set of k variables (i.e., conditionally) all information is contained in k independent random variables, U 1 , U 2 , . . . , U k . These independent components can be extracted and utilized. Specifically, by using the conditional distribution of W i , which only depends on the two preceding partial products, W i−1 and W i−2 , we define independent variables Z i 's as Z i = Pr(W i > w i | W i−1 , W i−2 ). Successive partial products relate to one another as: Since U 1 L−k+1 ∼ Beta(L − k + 1, 1), the conditional density and the CDF for the product are found as follows: Then, We now obtained a transformation to a new set of independent uniform (0 − 1) random variables.
where G −1 λ i is the inverse gamma CDF with the shape λ i and the scale 1. Under H 0 , Y has a gamma distribution with the shape equal to the sum: k i=1 λ i . The combined P-value is now obtained as: When λ i is large, the gamma CDF approaches the standard normal CDF, which motivates the inverse normal transformation. The quantiles will be calculated by using The inverse normal method is useful for the reason that the joint distribution of the partial sums can be derived in a standard way to evaluate the adaptive ART (ART-A) P-value. For the ART-A, we define partial sums as: where Φ −1 (·) is inverse CDF of the standard normal distribution. Then, under the null hypothesis, S = (S 1 , S 2 , ..., S k ) T follows a multiviate normal distribution, MVN(0, Σ), with Σ = FWF T and where λ are weights. In our simulation experiments, we set all λ i = 1, however one may take advantage of some information about the effect size distribution, if that is available. If power is high, but the signal is sparse, it would be expected that true signals may tend to rank among the smallest P-values. In this case, one possible sequence of weights is λ 2 k−i+1 = k k−i+1 . Such weights that emphasize partial sums with few terms can also be used in certain situations where P-value distribution is expected to be skewed from the uniform (e.g., due to discreteness of a test statistic), with many P-values being close to one. Finally, the vector S can be standardized as . The null distribution of T is used to evaluate ART-A by using Pr(S i /σ i > s i ) probabilities and to obtain quantiles (significance thresholds) using commonly available MVN distribution functions (e.g., mvtnorm R package) (Mi et al. (2009)).

S-5 SIMULATION SETUP
We performed B=100,000 simulations to evaluate the Type I error rate and power of the proposed methods. To study performance of combination methods for independent P-values, in each simulation, we generated L normally distributed statistics, X ∼ N (µ, 1). The squared values of X follow the chi-square distribution with one degree of freedom and noncentrality parameter µ 2 , X 2 ∼ χ 2 (1,µ 2 ) . P-values were obtained as one minus the CDF of the noncentral chi-square evaluated at X 2 , or as P = 2 − Φ (|X| + |µ|) − Φ (|X| − |µ|) in terms of the normal CDF. P-values generated from normal statistics (without squaring them) were also considered, but these results are omitted for brevity, because the resulting ranking of the methods by power was found to be similar. Under H 0 , L P-values were sampled from the uniform (0, 1) distribution, which is equivalent to setting µ to zero.
To study non-independent P-values, we simulated L statistics from a mutivariate normal distribution MVN (µ, Σ) and decorrelated them by eigendecomposition as described in "Methods" section. In each simulation, a correlation matrix Σ was generated randomly by perturbing an equicorrelated matrix D. Specifically, we added perturbation to equicorrelated matrix D with off-diagonal elements ρ = 0.5 as: where u is random vector (Bartlett (1951)). Then, R was converted to a correlation matrix Σ with offdiagonal elements ρ ij = . The amount of "jiggle" in R depends on the variability of elements in u. If elements of u are generated in the range between −δ and δ, the value of δ would represent the upper bound for the amount of jiggle allowed between pairwise correlations in Σ. In our simulations, we set δ = 1, allowing for a mix of positive and negative values of ρ ij in Σ.
In addition, we evaluated power of the methods by using correlation due to linkage disequilibrium (LD) in real data (Table (S1). The 11 ×11 correlation matrix was estimated from previously reported haplotype frequencies of eleven SNPs in the µ-opioid receptor (MOR) gene (Shabalina et al. (2008); Kuo et al. (2014)).