Robust Reference Powered Association Test of Genome-Wide Association Studies

Genome-wide association studies (GWASs) have identified abundant genetic susceptibility loci, GWAS of small sample size are far less from meeting the previous expectations due to low statistical power and false positive results. Effective statistical methods are required to further improve the analyses of massive GWAS data. Here we presented a new statistic (Robust Reference Powered Association Test1) to use large public database (gnomad) as reference to reduce concern of potential population stratification. To evaluate the performance of this statistic for various situations, we simulated multiple sets of sample size and frequencies to compute statistical power. Furthermore, we applied our method to several real datasets (psoriasis genome-wide association datasets and schizophrenia genome-wide association dataset) to evaluate the performance. Careful analyses indicated that our newly developed statistic outperformed several previously developed GWAS applications. Importantly, this statistic is more robust than naive merging method in the presence of small control-reference differentiation, therefore likely to detect more association signals.


INTRODUCTION
Genome-wide association studies (GWASs) have been widely applied with the goals to detect genetic variants which contribute to complex traits in the past decade (McCarthy et al., 2008). In general, allele frequencies of genetic variants are compared between cases that are supposed to have a high prevalence of susceptibility alleles and controls that are considered to have a lower prevalence of such alleles. And genomic loci correlated with various traits had been detected using the efficient approaches (He et al., 2009).
Although GWASs have led to abundant significant findings (Easton et al., 2007;Hakonarson et al., 2007;Parkes et al., 2007;Zeggini et al., 2007;Thomas et al., 2008), a few practical difficulties hinder the discovery of more rare or low-frequency genetic variants. For example, limited sample sizes make it difficult to achieve high statistical power which shows the probability of identifying the latent genetic variants (Wellcome Trust Case Control Consortium, 2007;He et al., 2009). Besides, difference in genetic background, also known as population stratification, between cases and controls could inflate type I error rate, thereby, leading to increasing level of false positive findings (Bacanu et al., 2000).
Availability of large public datasets contain large amount of useful genetic information. Utilizing large public datasets as reference may increase the sample size and ease the low power brought by insufficient sample size. However, population stratification and genotyping platform differences (He et al., 2009) between public datasets and cases may lead to inflated type I error rate. Our study aim is to appropriately utilize the public datasets to conservatively ameliorate the situation of small samples and low statistical power. In this manuscript, we describe a novel method to properly use public datasets (gnomad) as reference. In particular, we introduce a large public population as reference which has similar genetic background with control group. The null hypothesis was that allele frequency of SNPs among case (ca), control (co) and reference (re) was equal, and that the control and reference might compose a new large reference (co+re).
Specifically, we constructed a new test statistic T = ln(P(ca − (co + re))) − ln(P(co − re)) in which P(ca − (co + re)) is p-value of 2 × 2 Fisher exact test (Fisher, 1922(Fisher, , 1954Agresti, 1992) of the SNP between case and control+reference, and P(co − re) is p-value of 2 × 2 Fisher exact test (Fisher, 1954;Agresti, 1992) of the SNP between control and reference. The T statistic takes differences between control and reference into account, which is more robust than P(ca − (co + re)) or P(ca − re)).
In this manuscript, we present a method to use public datasets as reference in the association analysis. The results of simulation and real data application showed that our new method could increase statistical power, particularly for small GWAS researches in real application. And the online tool (Robust Reference Powered Association Test 2 ) has been made available.

Results From Simulation Study
To evaluate the performance of our model for various situations, we simulated six parameters to compute the desired statistical power. The parameters include case sample size, allele frequency in case, control sample size, allele frequency in control, reference sample size and allele frequency in reference. Six different parameters were set to several typical values to simulate real scenarios. The case and control size were set from small to large. Also, allele frequencies were set from rare to common. Additionally, we selected two different reference of 10,000 and 100,000 samples. In detail, we set case size (case = 100, 500, 1000, 3000), control size (control = 0.5 * case, 1 * case, 2 * case, 5 * case), reference size (reference = 10,000, 100,000   Table S1). Given sample size (N) and allele frequency (q), we simulated the count of allele which followed binomial distribution [B(2N, q)] for 100000 times. We drawn the QQ plot (Turner, 2014) of a representative simulation under H0 to assess the test statistic is well-calibrated (case size = 500, allele frequency in case = 0.165, control size = 500, allele frequency in control = 0.165, reference size = 10000, allele frequency in reference = 0.165) (Supplementary Figure S1). We have selected several representative cases to analyze the whole results of simulations (Table 1). First, supposed that the sample size of case, control and reference equaled 500, 500, and 10,000, respectively which were close to the real conditions. And allele frequencies in case, control, and reference were initialized to be common (0.15). As indicated in Table 1, the type 1 error rate of T was a bit higher than that of ca−(co+re), while both were less than significant levels (alpha = 0.05, 0.01, 0.001). When there was small population genetic differentiation between control and reference, allele frequencies in case and control equaled 0.165 while allele frequency in reference was set as 0.15. In this case, the type 1 error rate of T was less than that of ca−(co+re). Therefore, as mentioned in Section "Materials and Methods, " our new test is more robust than the simple P(ca − (co + re)) method in the presence of small control-reference differentiation. As shown in Table 1, false positive results might occur if there was small population genetic differentiation between control and reference while no differentiation between case and control. To evaluate the influence of control size, we simulated different control sizes with the aim to detect the false positive rate under different allele frequencies of case and control (see Supplementary Table S2). We found that the false positive rate of P(T) would decrease when control size was augmented. Furthermore, P(T)'s type 1 error rate was less than P(ca − co)'s when the sample size of the controls was five times larger than that of the cases. The false positive results were kept as low as possible when the sample size of the controls was four times larger than that of the cases (see Supplementary Table S2). In addition to population stratification between control and reference, allele frequency differentiation between case and control, between case and reference due to different genetic background were also simulated (see Supplementary Table S3). When there was population stratification between case and control without differentiation between case and reference, the type 1 error rate of T was less than significant levels (alpha = 0.05, 0.01, 0.001) while the type 1 error rate of ca−(co+re) and ca−co were large than significant levels. In another scenario of population stratification between case and reference without differentiation between case and control, results of T might be false positive. However increased sample size would control the type 1 error rate.
When the allele frequency in case was different from that in control, T's power was always higher than power of ca−co. We could find that when there was a slight change of allele frequency in case, the power of T was much higher than power of ca−co, indicating that our method had high sensitivity for GWAS. When the allele frequency in case was much higher than control's, the power approached to 1 with remarkable increase of T's power. When the allele frequency was rare, we could also draw the same conclusion that our method could keep false positive rate low and drastically increase the statistical power.

Results From the Psoriasis GWAS Datasets
We applied our newly developed method to two psoriasis GWAS (Nair et al., 2006;Fang et al., 2011) datasets to evaluate the performance. 1,590 subjects (915 cases, 675 controls) in the general research use (GRU) group and 1,133 subjects (431 cases and 702 controls) in the autoimmune disease only (ADO) group were analyzed.
For the GRU group and ADO group, SNPs that failed to pass the HWE exact test were filtered (we used the p = 0.001 as the threshold). Fisher exact test was used to compute the p-value of allele frequency for each SNP. The threshold of p-value was set as 1.2 × 10 −7 by Bonferroni Correction. Then we selected first 100 SNPs of lowest p-values for further analysis. Two different large public datasets, gnomad.genome. NFE (Non-Finnish European, N = 7,509) (ref1) (Lek et al., 2016) and gnomad.exome.NFE (Non-Finnish European, N = 55,860) (ref2) (Lek et al., 2016), were selected as the reference groups. So there were four combinations: GRU group vs. ref1 (GRU_ref1), GRU group vs. ref2 (GRU_ref2), ADO group vs. ref1 (ADO_ref1) and ADO group vs. ref2 (ADO_ref2). For each condition, we computed the p-value of our model (see Supplementary Table S4). Specially for the exome dataset (ref2), some SNPs not in the exome were excluded from the table.
To inspect the improvement of p-values in the whole level, we drew the Manhattan plot of GRU group (Figure 1) and ADO group (Figure 2) respectively. We observed notable changes of p-values before and after performing our method. The peaks of Figures 1, 2 jumped from about 1e-22 and 1e-32 to 1e-62 and 1e-52 respectively. Also the amount of SNPs with p-value between 1e-5 and 1e-8 has increased. The positive SNPs (rs12191877, rs9468933, etc.) became more positive due to the added genetic information of reference. Also, our method rescued a few SNPs which turned from negative to positive (see Supplementary  Table S5). Then, we searched pubmed literature and found some SNPs associated with psoriasis that had been reported by other studies. Moreover we checked these novel significant SNPs in GWAS catalog (Macarthur et al., 2016) to validate whether the associations were replicated in recent related large GWAS studies. By integrating the results of ref1 and ref2, we found that the SNPs with ref2 as reference were included in the results of ref1. And we presented the novel SNPs of GRU group ( Table 2) and ADO group ( Table 3). We calculated the genomic inflation factor, also known as lambda (λ). In GRU group, λ of P(T) and P(ca − co) were 1.08 and 0.98 respectively. And in ADO group, λ of P(T) and P(ca − co) were 0.81 and 0.96, respectively. Besides QQ plot of P(T) and P(ca − co)were drawn to inspect the distributions (Supplementary Figure S2).

Results From the Schizophrenia GWAS Datasets
We also applied our newly developed method to one schizophrenia GWAS  dataset to evaluate the performance. The SNPs that did not pass the HWE exact test were filtered (p = 0.001). Fisher exact test was used to compute the p-value of allele frequency for each SNP. There was no statistically significant SNP with the threshold as 7.14 × 10 −8 by Bonferroni Correction. The large public datasets, gnomad.genome.NFE (Non-Finnish European, N = 7509) (ref1) (Lek et al., 2016) was selected as reference to compute the p-value of our model (see Supplementary Table S6).
For the schizophrenia GWAS dataset , the typical Fisher exact test did not yield genome-wide significant findings. After the introduction of reference, we found that several novel SNPs were associated with schizophrenia ( Table 4). The Manhattan plot (Figure 3) clearly showed that before performing our method there were only a few SNPs with p-value below 1e-5 in the bottom figure. And after performing our method, plenty of SNPs came to the fore with p-value between 1e-5 and 1e-8. In addition, there were significant SNPs with p-value below 7e-8, some of which had been reported in previous studies. The SNP rs12140791 is located in NOS1AP (MIM: 605551) gene which is essential for brain development and function and of potential relevance to schizophrenia (Glessner et al., 2010). The rs17021364 and rs110974077 were reported to be associated with schizophrenia in a genome-wide meta-analysis (Wang et al., 2010). The rs35648 (p-value = 9.65E-5) was also reported by a previous large-scale GWAS (Shi et al., 2009). The genomic inflation factors were 0.77 and 1.01 for P(T) and P(ca−co) respectively. Also QQ plot of P(T) and P(ca − co) were drawn (Supplementary Figure S4).

Framework of Robust Reference Powered Association Test
Suppose we have three populations: case, control and public data pools, intuitively we want to merge the control and reference population to form a large control pool to gain more power on allele-disease association test. However, we are concerned about the potential population differentiation and genotyping platform difference between control and the reference (He et al., 2008).
A simplistic way to alleviate such concern is to perform a control (co) vs. reference (re) Fisher exact test (Fisher, 1922). If a p-value is not significant and the control sample size is not too small, then this concern is resolved. However, choosing the significance level is arbitrary and the decision is subjective. We need an objective version of such procedure to correct the effect of co-re (control vs. reference) difference.
Denote the p-value of 2 × 2 Fisher exact test (Fisher, 1922(Fisher, , 1954Agresti, 1992) of the SNP between case and control+reference as P(ca − (co + re)). Denote the p-value of 2 × 2 Fisher exact test of the SNP between control and reference as P(co − re). We define a test statistic: T = ln(P(ca − (co + re))) − ln(P(co − re)) (Figure 4). T will be smaller if the difference between case and control + reference is larger, while T will be larger if the difference between control and reference is larger. Therefore, T is a more robust statistic than P(ca − (co + re)) as it takes the control-reference difference into account by penalizing P(co − re).
Our null hypothesis is that the allele frequency of the SNP is equal among cases, controls and references. Under the null hypothesis, both P(ca − (co + re)) and P(co − re) are independent and uniformly distributed on (0, 1). Under the null hypothesis, − ln(P(ca − (co + re))) and − ln(P(ca − co)) are exponentially distributed with parameter 1 (Casella and Berger, The SNPs reported in pubmed were shown in bold and italics. The SNPs reported in pubmed was shown in bold and italics. The SNPs reported in pubmed was shown in bold and italics.

2002)
. And T is the difference of two exponentially distributed variables, thus T is Laplace distributed as T∼ Laplace (0, 1) (Mcneil, 2003). The one side p-value of T is as follow: , P(ca − (co + re)) ≤ P(co − re) 1 − P(co − re) 2P(ca − (co + re)) , P(ca − (co + re)) > P(co − re) From the above formula one can easily see that the P(co − re) acts as a penalizer/normalizer against P(ca − (co + re)). Therefore, our new statistic T is more robust than the simple P(ca − (co + re)) method in the presence of small controlreference differentiation. However, in the presence of strong population differentiation or genotyping platform difference, even our correction may not be effective, we therefore need to restrict P(co − re) to be not significant with a suggested threshold of 0.01.

The Psoriasis GWAS Datasets
We obtained the psoriasis datasets (Nair et al., 2006;Fang et al., 2011), as a part of the Collaborative Association Study of Psoriasis (CASP), from the Genetic Association Information Network (GAIN) database (dbGaP Accession No. phs000019.v1. p1), a partnership of the Foundation for the National Institutes of Health. All genotypes were filtered by checking for data quality (Feng et al., 2009). A dermatologist diagnosed all psoriasis (MIM: 177900) cases. Each participant's DNA was genotyped with the Perlegen 500K array. Both cases and controls agreed to sign the consent contract, and controls (≥18 years old) had no confounding factors relative to a known diagnosis of psoriasis.

The Schizophrenia GWAS Dataset
The schizophrenia dataset  came from the GAIN dataset (dbGaP Access No. phs000021.v1.p1), including 1,195 cases with schizophrenia (MIM: 181500) and 954 controls. The subjects were genotyped on AFFYMETRIX AFFY_6.0 platform. All subjects were at least 18 years old. The cases included 746 males (41.9 ± 10.8 years) and 449 females (43.0 ± 9.8 years); and the controls included 362 males (46.2 ± 13.7 years) and 592 females (45.0 ± 12.9 years). Affected subjects met lifetime DSM-IV criteria for schizophrenia (American Psychiatric Association 1994). Cases were excluded if they had worse than mild mental retardation, or if their psychotic illness was judged to be secondary to substance use or neurological disorders. Controls were excluded if they did not deny all of the following psychosis screening questions: treatment for or diagnosis of schizophrenia or schizoaffective disorder; treatment for or diagnosis of bipolar disorder or manic-depression; treatment for or diagnosis of psychotic symptoms such as auditory hallucinations or persecutory delusions.

DISCUSSION
Associations between SNPs and complex traits were detected by comparing frequencies of alleles in case and control group in GWAS (McCarthy et al., 2008). Several significant SNPs have been identified in classic GWAS studies (Duerr et al., 2006;Hunter, 2007;Hunter et al., 2007;Scott et al., 2007). However, other SNPs of low frequencies which contribute to the complex traits remain hidden in the false negative results Willer et al., 2008;Sanna et al., 2008;Nair et al., 2009;Wang et al., 2013). To identify more novel susceptibility loci, large-scale GWAS is a costly and time-consuming approach. It is necessary to design more sensitive and accurate statistical methods. Interestingly, the results indicated that our method could increase the power which may contribute to detecting more significant SNPs.
Our simulations focused on several representative cases to evaluate the performance of the new model. When the allele frequencies of case, control and reference were equal, the power of T was higher than power of ca−(co+re), while both less than false positive rate. False positive results would be produced in the presence of small population genetic differentiation between control and reference. However, as the control size increased, the false positive rate of P(T) would be reduced ( Table 1). The false positive rate would be controlled well, when the control sample was large enough. What's more, T's power was always higher than power of ca−co, when the allele frequencies of case and control were different, indicating that our method was more robust and had high sensitivity for GWAS.
Considering that allele frequency divergence of multiple variants due to different platforms is more serious than slight population stratification in practical application, we designed the statistic based on single variant rather than multiple variants. Allowing slight population stratification between control and reference, the simulation has shown the large sample size of control would suppress the false positive results. Moreover the statistical power and utility of our method were also elevated.
In the aforementioned application of our method to the psoriasis GWAS genetic datasets (Nair et al., 2006;Fang et al., 2011), we selected two large public datasets as reference groups (Lek et al., 2016). With different references, the model will compute different p-values for the test statistic. In the case of scenario, the p-value is lower than threshold of 0.01, indicating significant differentiation between control and reference, the result would be false positive. In this situation, selecting an appropriate dataset as the reference is the key to obtain better result. In addition, selecting multiple different references is feasible with the online tool. By comparing and integrating the output of different references, reasonable significant P(T) could provide more effective information for SNPs associated with the traits.
Novel SNPs with weak positive signals could be discovered when the sample size of case and control are insufficient in GWAS (He et al., 2009). With the support of reference database, the new p-value of some false negative genetic loci would decrease significantly down to the threshold. And our test statistic T is more robust than P(ca − (co + re)) as it takes the controlreference difference into account by penalization of P(co − re).
We presented a new statistic T to use large public database as reference to reduce concern of potential population stratification. And the new statistic proposed here is effective to discover novel genome-wide significant loci with both small and large sample sizes.

AUTHOR CONTRIBUTIONS
YW conceived the idea and developed the software. YL, MH, and XL contributed data analysis, generating tables and figures, and manuscript writing. YW, YL, MH, XL, YS, and LJ contributed the theoretical analysis and manuscript revision. MX helped support the GWAS datasets. YW, YL, MH, XL, MZ, JW, and MX contributed to scientific discussion. All authors contributed to final revision of the manuscript.

ACKNOWLEDGMENTS
We thank the Fudan University High-End Computing Center for supporting computations involved in this study. This manuscript has been released as a Pre-Print at https://www.biorxiv.org/ content/early/2018/05/08/316703.