GSTM1 Copy Number Is Not Associated With Risk of Kidney Failure in a Large Cohort

Deletion of glutathione S-transferase µ1 (GSTM1) is common in populations and has been asserted to associate with chronic kidney disease progression in some research studies. The association needs to be validated. We estimated GSTM1 copy number using whole exome sequencing data in the DiscovEHR cohort. Kidney failure was defined as requiring dialysis or receiving kidney transplant using data from the electronic health record and linkage to the United States Renal Data System, or the most recent eGFR < 15 ml/min/1.73 m2. In a cohort of 46,983 unrelated participants, 28.8% of blacks and 52.1% of whites had 0 copies of GSTM1. Over a mean of 9.2 years follow-up, 645 kidney failure events were observed in 46,187 white participants, and 28 in 796 black participants. No significant association was observed between GSTM1 copy number and kidney failure in Cox regression adjusting for age, sex, BMI, smoking status, genetic principal components, or comorbid conditions (hypertension, diabetes, heart failure, coronary artery disease, and stroke), whether using a genotypic, dominant, or recessive model. In sensitivity analyses, GSTM1 copy number was not associated with kidney failure in participants that were 45 years or older at baseline, had baseline eGFR < 60 ml/min/1.73 m2, or with baseline year between 1996 and 2002. In conclusion, we found no association between GSTM1 copy number and kidney failure in a large cohort study.


INTRODUCTION
Glutathione S-transferase µ1 (GSTM1), belongs to the family of glutathione-S-transferases that metabolize a broad range of reactive oxygen species and aldehydes (Hayes et al., 2005;Yang et al., 2009). Loss of GSTM1 is very common in the population; ~50% of whites and 27% of blacks have 0 copies of GSTM1 (Garte et al., 2001). Deletion of one or both copies of the gene results in reduced amount of GSTM1 (Board et al., 1990) and could lead to increased oxidative stress due to diminished ability to neutralize reactive chemical species. An association between loss of GSTM1 and chronic kidney disease (CKD) progression has been reported in the African American Study of Kidney and Hypertension (AASK) (Chang et al., 2013). The association between GSTM1 copy number and kidney failure was reported in the Atherosclerosis Risk in Communities (ARIC) study, which had a larger sample size, both in blacks and whites (Tin et al., 2017). However, data from smaller case-control studies have had mixed findings (Yang et al., 2004;2 August 2019 | Volume 10 | Article 765 Frontiers in Genetics | www.frontiersin.org Agrawal et al., 2007;Tiwari et al., 2009;Gutierrez-Amavizca et al., 2013;Nomani et al., 2016). The inconsistent results may be due to insufficient sample size and different populations and disease background.
The Geisinger MyCode ® Community Health Initiative is an EHR-linked biobank for precision medicine research (Carey et al., 2016). The population served by Geisinger have low rates of out migration; thus, the electronic health record (EHR) data are relatively complete longitudinally, with a median of 14 years of follow-up (Carey et al., 2016). In addition, a high proportion of participants were found to have first-and second-degree relatedness through cryptic relatedness analysis (Staples et al., 2018). Through the ongoing DiscovEHR collaboration with the Regeneron Genetic Center, whole exome sequence (WES) data are available from ~92,000 MyCode ® participants to date (Abul-Husn et al., 2016;Carey et al., 2016;Dewey et al., 2016). These comprehensive clinical data are linked to matched genetic data and provide power to identify disease-genetic associations (Rader and Damrauer, 2016).
Confirming whether or not loss of GSTM1 increases the risk of kidney failure is important, given the high prevalence of GSTM1 loss in the population and the serious morbidity and mortality associated with kidney failure. In this study, we examine the relationship between GSTM1 copy number and incident kidney failure in the Geisinger MyCode cohort.

Study Population
The study population included 72,756 participants in the Geisinger-Regeneron DiscovEHR cohort with WES data and at least three serum creatinine values. We excluded participants who had baseline eGFR < 15 ml/min/1.73 m 2 or history of dialysis or transplant (n = 417), baseline age <18 (n = 1,960), missing BMI values (n = 230), and unknown smoking status (n = 2,694). Cryptic relatedness was evaluated using IBD method in PLINK1.9 (Chang et al., 2015). One of the related pair of participants with PI_HAT ≥ 0.125 were removed to reduce confounding from other shared genetic/ environmental factors that could not be assessed, resulting in 46,983 participants (Supplement Figure 1). We estimated eGFR using the CKD-Epidemiology Collaboration equation (Levey et al., 2009;Levey and Stevens, 2010). We defined the study period, from a baseline time of the second serum creatinine to the time of a renal endpoint, or the last serum creatinine test.

Outcome Definition
An assignment of kidney failure was made if any of the following criteria was met: 1) the last available eGFR was <15 ml/min/1.73 m 2 ; 2) the EHR showed an International Classification of Disease (ICD) code for end stage renal disease (ESRD) (ICD9: 585.6, ICD10: N18.6); and 3) receipt of dialysis or transplant per linkage to the United States Renal Data System (USRDS). For participants who met more than one criteria, the earliest documented date of the criterion was considered the kidney failure date.

GSTM1 and GSTM5 Copy Number Estimation
DNA was sequenced in two batches at the Regeneron Genetic Center. The WES data processing has been described elsewhere Staples et al., 2018), and details are also provided in the supplementary methods. Copy number of GSTM1 was estimated using the method reported previously (Tin et al., 2017). Briefly, sequence coverage was normalized using CLAMMS software (Packer et al., 2016). Normalized coverage of all eight exons of GSTM1 was summed for each participant. Thresholds for GSTM1 copy number were determined empirically according to the distribution of sum of normalized coverage (Supplementary Figure 2). The copy number of GSTM5 was estimated the same way as GSTM1 (Supplementary notes).

Statistical Analyses
Hardy-Weinberg equilibrium (HWE) was estimated using chisquare test for GSTM1 copy number. A more stringent significance level of 1e −5 was selected due to the large sample size (46,187) for whites. Nominal significance level of 0.05 was used for blacks as the small sample size (796). Missing baseline BMI values were imputed using mean BMI values calculated from all available BMI values for each participant, and missing smoking status at baseline was imputed using the most recent recorded smoking status in the EHR (Supplemental Figure 3). Baseline characteristics were compared using ANOVA for continuous variables and chisquared tests for categorical variables. A p < 0.05/N was considered statistically significant after Bonferroni adjustment for multiple comparisons, where N is the number of variables compared. Kaplan-Meier curves were plotted by GSTM1 copy number, and survival differences between genotype groups were assessed using a log-rank test. All subsequent analyses were stratified by race given the difference in allele frequency and sample size. Two models were evaluated. In model 1, Cox proportional hazards model was adjusted for age, sex, and the first four genetic principle components; based on model 1, model 2 was additionally adjusted for risk factors including baseline eGFR, smoking status, BMI, hypertension, diabetes, coronary artery disease, heart failure, and stroke. Three genetic models were evaluated: 1) genotypic model, using copy numbers 0, 1, and 2 as three categories; 2) dominant model (0 or 1 copy vs. 2 copies GSTM1); and 3) recessive model (1 or 2 copies vs. 0 copies of GSTM1). To explore whether the effect of GSTM1 loss was stronger in specific higher-risk subgroups, sensitivity analyses were completed including subsets of 1) older participants (baseline age ≥ 45 years), 2) participants with CKD (baseline eGFR < 60 ml/min/1.73 m 2 ), and 3) participants with longer follow-up (baseline year 1996-2002). Power was estimated using powerCT() function in R package powerSurvEpi 0.1.0. Given the current sample size and event number, we have power of ≥0.8 to test hazard ratio of at least 1.4 and 2.8 for white and black cohort, respectively. All analyses were performed using R (version 3.4.3).

Study Cohort and Baseline Characteristics
There were 46,983 unrelated participants included in the main analysis. Among these, 46,187 (98.3%) were white, and 796 (1.7%) were black. Frequency of GSTM1 copy number differed between blacks and whites (Supplemental Figure 4). Almost half of the blacks (49.2%) had one copy of GSTM1 compared with only 39.6% of whites, and more than half (52.1%) of whites had zero copies of GSTM1 compared with 28.8% of blacks. Chisquare test indicated GSTM1 copy number follows the HWE ( Table 1; p values were 1 × 10 −4 for whites and 0.618 for blacks).
Baseline demographic and clinical characteristics of the participants stratified by race and GSTM1 copy numbers are provided in Table 1. The average baseline age for whites and blacks was 51 and 44 years old, respectively. The mean baseline eGFRs were 91 and 92 ml/min/1.73 m 2 . Prevalence of hypertension was 29% in whites and 44% in blacks, and prevalence of type 2 diabetes was 13% in whites and 21% in blacks at baseline. No statistically significant differences were observed for baseline characteristics among the three genotype groups when adjusted for multiple comparisons.

Time-to-Event Analysis
Over a mean follow-up of 9.2 years, there were a total of 645 kidney failure events in 46,187 white participants. Over a mean follow-up of 5.5 years, there were 28 events in 796 blacks. Kidney failure cases showed more males, older ages, lower eGFR in both white and black participants (Table 1). Unsurprisingly, kidney failure cases had higher percentage of risk diseases including hypertension, diabetes and cardiovascular diseases in white participants ( Table 1). The case number in blacks was too small to have enough statistical power. No significant difference in kidney failure-free survival was found across GSTM1 copy numbers by log-rank test (p = 0.9 and 0.5 for whites and blacks, respectively, Figure 1).
Findings were similar in Cox regression analyses adjusting for other covariates ( Table 2). In the genotypic model, after adjusting for age, sex, and the first four PCs, there was no difference in risk of kidney failure among white participants with 0 copies [hazard ratio (HR) 1.01, 95% confidence interval (CI): 0.81-1.48)] or 1 copy (HR 1.08, 95% CI: 0.80-1.47), compared to those with 2 copies. Findings were similar for black participants in the genotypic model adjusting for age, sex, and the first four PCs (0 copies HR 0.68, 95% CI: 0.22-2.15; 1 copy HR 1.13, 95% CI: 0.44-2.90). In the dominant model, after adjusting for age, sex, and the first four PCs,

DISCUSSION
To our knowledge, this is the largest study to investigate the association of GSTM1 copy number variation with kidney failure. There were a total of 46,187 unrelated whites and 796 blacks in the study with an average of 9.3 years' follow-up. The frequency of the GSTM1 copy numbers were very similar to those reported previously (Garte et al., 2001;Chang et al., 2013;Tin et al., 2017). Our results showed no significant association between GSTM1 copy number and risk of kidney failure in unadjusted or adjusted analyses whether using a genotypic, dominant, or recessive genetic model. Data from the ARIC and AASK cohorts suggested that the loss of GSTM1 increased the risk of kidney failure or accelerated CKD progression (Chang et al., 2013;Bodonyi-Kovacs et al., 2016;Tin et al., 2017). In ARIC, a community-based cohort of middle-aged black and white participants, there were 3,461 white participants with WES reads. In fully adjusted models among whites in ARIC, loss of GSTM1 was associated with risk of kidney failure (0 or 1 copy vs. 2 copies: HR 2.54; 95% CI: 1.32-4.88). It is possible that differences in baseline characteristics of the study populations could explain differing findings. Our cohort had a lower prevalence of smoking (never smoking 48 vs. 38%), a higher prevalence of diabetes (14 vs. 8%), and was more contemporary (median baseline year 2004(median baseline year vs. ARIC baseline year 1987(median baseline year -1989. However, in sensitivity analyses in 14,572 white participants with baseline year between 1996 and 2002 in our cohort, there was no association between GSTM1 loss and risk of kidney failure. There is also the possibility that GSTM1 may only be deleterious when GFR falls below certain levels or in the setting of specific types of kidney disease. In the AASK study, a randomized trial comparing different antihypertensive medications and levels of blood pressure control in blacks with CKD attributed to hypertension, 692 participants had GSTM1 genotyping completed. Loss of GSTM1 in AASK was associated with increased risk of CKD progression (HR 0 copy: HR 1.88; 95% CI: 1.07-3.30; HR 1 copy: 1.68; 95% CI: 1.00-2.84). While there were few blacks in our study population with CKD, we found no association between loss of GSTM1 and risk of kidney failure in 3,129 white participants with baseline eGFR < 60 ml/min/1.73 m 2 . Some smaller casecontrol studies also tested a recessive genetic model (0 copies vs. 1 or 2 copies) of GSTM1 and found that GSTM1 0 copy was associated with kidney failure in several case-control studies (Agrawal et al., 2007;Chang et al., 2013;Gutierrez-Amavizca et al., 2013;Suvakov et al., 2013). In our study, however, we examined both recessive and dominant genetic models and found no associations between GSTM1 copy number and risk of kidney failure.
Some limitations of this study are worth noting. First, the genotypes of GSTM1 in this study were derived from WES, and the thresholds were determined empirically from the histogram. Multiplex PCR validation was not performed for this study. However, this method was used in the ARIC study, which showed 99.3% agreement when comparing the results of 0 copy with ≥ 1 copy from PCR (Tin et al., 2017). Moreover, the frequency of GSTM1 copy numbers in this study is similar to those reported previously for both whites and blacks (Chang et al., 2013;Tin et al., 2017), and it follows the Hardy-Weinberg equilibrium. We also examined the possibility of mismapping due to the complexity of GSTM locus. The histogram of the sum of normalized coverage of GSTM5, the paralog gene of GSTM1, is unimodal distribution. Based on these evidences, we believe that the copy number results derived from WES had a low error rate and can be used with confidence for analysis. Second, the sample size of blacks was small with only 796 patients available for analysis, limiting the power to examine an association between GSTM1 and kidney failure in this population. The strength of this study is the large sample size of whites, which included 46,187 patients, allowing for large subgroup analyses in older patients, those with baseline eGFR < 60 ml/min/1.73 m 2 , and those with longer duration of follow-up. Finally, other factors such as SNPs and haplotypes in the GSTM cluster, CNV of other GST genes, such as GSTT1, and environmental factors such as diet were not examined. A recent study identified substantial differences of SNP-based haplotype in GSTM cluster among groups of individuals with different GSTM1 CNV, suggesting a role for local genetic context in the deletion frequency thus the deletion-related effects on various disease (Khrunin et al., 2016). Studies that examined the combined effect of GSTM1 and GSTT1 deletion showed increased risk for ESRD in patients with null genotype of both GSTM1 and GSTT1 (Agrawal et al., 2007;Suvakov et al., 2013;Nomani et al., 2016). GSTM1 deletion is known to be protective for lung cancer among individuals with a high intake of cruciferous vegetable and deleterious among those with low intake of cruciferous vegetable (London et al., 2000;Zhao et al., 2001;Wang et al., 2004;Brennan et al., 2005;Carpenter et al., 2009). We were not able to analyze the effect modification between GSTM1 copy number and cruciferous vegetable intake.
In conclusion, our study does not support an association between loss of GSTM1 and increased risk of kidney failure in whites. Additional research is needed to confirm whether loss of GSTM1 increases risk of kidney failure in certain subgroups such as blacks. The interaction with diet, and genetic context will also be examined in future. August 2019 | Volume 10 | Article 765 Frontiers in Genetics | www.frontiersin.org

DATA AVAILABILITY
The individual EHR and genetics datasets (even de-identified) used and/or analyzed during the current study are not publicly available. The availability restriction is a condition of the ethical approval and Geisinger policy and terms with RGC. Collaboration requests and data use agreements with Geisinger are necessary to obtain access to the deidentified EHR data.

ETHICS STATEMENT
All participants provided their informed written consent, and the study was approved by the Geisinger Institutional Review Board.

AUTHOR CONTRIBUTIONS
AT, AC, and ML designed the study; DH extracted data; YZ and WZ performed the analysis and drafted the manuscript. MW, ML, and AC did critical review on the paper. RGC supported the study and reviewed the manuscript. All authors approved the final version of the manuscript.

FUNDING
The Regeneron Genetics Center funded the collection of genomic study samples, the generation of genotype data, and genotype imputation. Geisinger provided funding for clinical data extraction and genetic association analysis. AC is supported by the National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Diseases grant K23 DK106515-01