Association of Candidate Gene Polymorphisms With Chronic Kidney Disease: Results of a Case-Control Analysis in the Nefrona Cohort

Chronic kidney disease (CKD) is a major risk factor for end-stage renal disease, cardiovascular disease and premature death. Despite classical clinical risk factors for CKD and some genetic risk factors have been identified, the residual risk observed in prediction models is still high. Therefore, new risk factors need to be identified in order to better predict the risk of CKD in the population. Here, we analyzed the genetic association of 79 SNPs of proteins associated with mineral metabolism disturbances with CKD in a cohort that includes 2,445 CKD cases and 559 controls. Genotyping was performed with matrix assisted laser desorption ionization–time of flight mass spectrometry. We used logistic regression models considering different genetic inheritance models to assess the association of the SNPs with the prevalence of CKD, adjusting for known risk factors. Eight SNPs (rs1126616, rs35068180, rs2238135, rs1800247, rs385564, rs4236, rs2248359, and rs1564858) were associated with CKD even after adjusting by sex, age and race. A model containing five of these SNPs (rs1126616, rs35068180, rs1800247, rs4236, and rs2248359), diabetes and hypertension showed better performance than models considering only clinical risk factors, significantly increasing the area under the curve of the model without polymorphisms. Furthermore, one of the SNPs (the rs2248359) showed an interaction with hypertension, being the risk genotype affecting only hypertensive patients. We conclude that 5 SNPs related to proteins implicated in mineral metabolism disturbances (Osteopontin, osteocalcin, matrix gla protein, matrix metalloprotease 3 and 24 hydroxylase) are associated to an increased risk of suffering CKD.


INTRODUCTION
Chronic kidney disease (CKD) is a major risk factor for end-stage renal disease, cardiovascular disease and premature death (Go et al., 2004;van der Velde et al., 2011). Nowadays, its worldwide prevalence is estimated in 7.2%, increasing dramatically among elderly people (Zhang and Rothenbacher, 2008), so it is predicted that CKD will become a highly prevalent disease due to population aging. Indeed, the Global Burden of Diseases, Injuries, and Risk Factors Study 2015 calculated that, in absolute numbers, its worldwide prevalence increased a 26.9% over the last 10 years , and a similar increase has been observed in the average years living with disability of CKD patients, which increased a 23.8% (GBD 2015 Disease andInjury Incidence andPrevalence Collaborators, 2016). Thus, considered as a major public health problem, international efforts are needed for CKD prevention, early detection and treatment. Different risk factors for CKD have been successfully identified, being the leading ones diabetes, hypertension and obesity, but also including gender, age and smoking (Kazancioglu, 2013).
Currently, it is accepted that CKD and its associated risk factors present a considerable genetic component. Nowadays, Genome Wide Association Studies (GWAS) have identified novel genetic variants associated with CKD, its progression, and its associated pathologies (Wuttke and Köttgen, 2016;Limou et al., 2018). Some of these SNPs are located in genes coding for proteins whose alteration could be causative of CKD. For example, Köttgen et al. (2009) found association of the UMOD gene with CKD; mutations on this gene were associated to rare autosomal dominant tubulointerstitial disease that leads to CKD (Devuyst and Pattaro, 2018). Later, in 2013, it was demonstrated that these polymorphisms lead to increased uromodulin expression, and mice overexpressing UMOD showed salt-sensitive hypertension and age-dependent renal lesions (Trudu et al., 2013).
Due to this huge bulk of knowledge and the clear association of some polymorphisms to CKD (Köttgen et al., 2009(Köttgen et al., , 2010Pattaro et al., 2016;Parsa et al., 2017), efforts have been focused on designing test for CKD diagnosis or prognosis based on these findings but, to this day, none has reached the clinical practice (Ma et al., 2017;Thio et al., 2018). Indeed, with vast number of markers in GWAS analyses, true 'hits' may become lost in a sea of false positives, due to the false discovery rate used to adjust p-values. The candidate gene approach is another widely used option to assess genetic predisposition to diseases. Candidate gene studies, although being incapable of discovering new genes or gene combinations, tend to have rather high statistical power, and a scientific hypothesis behind its rationale.
Mineral metabolism disturbances, defined as CKD mineral bone disorder (CKD-MBD), are a very common complication of CKD patients. The CKD-MBD is a disruption in the normal interplay between the kidney, skeleton, and cardiovascular system. Apart from biochemical and bone abnormalities, CKD-MBD is defined by an increase in vascular calcification. Those disturbances occur very early in CKD [increases in Fibroblast Growth Factor 23 have been described from CKD stage 2 (Wolf, 2012)], have a high impact in outcomes (Valdivielso et al., 2017) and, furthermore, are associated with CKD progression (Schwarz et al., 2006). Thus, CKD-MBD alterations are not only a consequence of CKD, but can be also playing a role in the causative pathway.
Therefore, in the present study, we assessed the association of 79 SNPs of genes reported in the literature as being associated to CKD-MBD and vascular calcification, (Rong et al., 2014;Tuñón-Le Poultel et al., 2014;Ammirati et al., 2015;Higgins et al., 2015;Xu and Sun, 2015) with CKD prevalence in the NEFRONA cohort.

Study Design and Participants
The NEFRONA study is a prospective, multicenter, observational study, in which 2,445 CKD subjects were recruited from 81 nephrology services and dialysis units throughout Spain, from October 2010 to June 2012 (Junyent et al., 2010). Patients between 18 and 74 years of age were eligible if they had CKD stage 3 or higher as defined by current guidelines [glomerular filtration rate below 60 mL/min/1.73 m 2 estimated using the Modification of Diet in Renal Disease (MDRD) 4 equation (Levey et al., 1999;Stevens et al., 2006)]. During the same period, 559 controls (MDRD4 above 60 mL/min/1.73 m 2 ) were recruited from Primary Care centers. Information about classical risk factors for CKD (diabetes, hypertension, and tobacco) as well as age and sex were recorded.
Exclusion criteria for both groups included: pregnancy, active infections, life expectancy lower than 12 months, history of a previous cardiovascular event, carotid artery surgery or any organ transplantation. All patients signed an informed consent and the local Ethics Committee of each hospital approved the protocol.

SNP Selection and Genotyping
We used blood samples from the NEFRONA study stored in the Biobank of the REDinREN (Calleros-Basilio et al., 2016) in Alcala de Henares, Madrid, for DNA extraction from all 2,445 CKD cases and 559 controls. Seventy-nine SNPs located in twenty-nine genes (Supplementary Table S1) related with CKD-MBD and vascular calcification that have been reported in the scientific literature (Rong et al., 2014;Tuñón-Le Poultel et al., 2014;Ammirati et al., 2015;Higgins et al., 2015;Xu and Sun, 2015) were genotyped. We only considered those SNPs with potential functional implications, so only those present in coding, promoter or 3 untranslated region were considered.
Genotyping was performed with matrix assisted laser desorption ionization-time of flight mass spectrometry in the Sequenom MassARRAY platform R , in CEGEN (Santiago de Compostela). As a quality control, the clusters for each SNP were reviewed and the SNPs with low genotyping percentage or not meeting Hardy Weinberg Equilibrium (HWE) were eliminated. Assessment of HWE was performed by means of a chi-squared test and an exact test for each SNP, considering a SNP in HWE when this was not rejected by any test. Samples with low genotyping percentage and with insufficient quality were also discarded. We also checked consistency within samples duplicating the same sample in the same plate and in different plates. In addition, a trio of samples from the Coriell Institute biorepository, for which published data were available, were included in each chip to verify the effectiveness of the genotyping.

Statistical Analysis
The clinical heterogeneity between cases and controls was assessed considering demographic and clinical variables. Mean and standard deviation or absolute frequency and percentage were computed for quantitative and qualitative variables, respectively, assessing differences with Mann-Whitney's U-test and Fisher test.
Hardy Weinberg Equilibrium was checked for all SNPs using data from both controls and cases. Since HWE was accepted for all SNPS, no specific evaluation was conducted to assess HWE only in controls. Genetic association was evaluated using logistic regression models with five different inheritance models: dominant, recessive, codominant, overdominant and additive models. Akaike's Information Criterion (AIC) was used to select the model with the best fit and therefore compute the corresponding p-value, odds-ratio (OR) and 95% confidence intervals. A permutation p-value was also computed for those SNPs found significantly associated to CKD. For this purpose, the likelihood ratio (LR) was assessed in 20,000 random permutations of case and control labels, computing the permutation p-value as the sample probability of obtaining a LR greater or equal to the one initially obtained. Moreover, potentially confounding variables were also included in each SNP model, performing a likelihood ratio test (LRT) to obtain an adjusted p-value.
To represent missing genotype data, SNPs were sorted by increasing percent of missing genotypes (Supplementary Table S2). The percentage of missing genotype was lower than 7% for 10 of the 12 SNPs that were significant in the single association analysis (6.16% for rs35068180 and rs35068180; 6.19% for rs4236, rs731236 and rs9138, 6.26% for rs1126616 and rs2238135, 6.32% for rs679620, 6.39% for rs1800247 and rs385564), however, it was quite high for two SNPs (46.3% for rs2248359 and 46.34% for rs1564858). No imputation was considered for the analysis, so all of them were made on a complete-cases basis. Thus, for the single association analysis, all the data cases available for each SNP were used but, for the multivariate models a reduced data base with 1603 out of 3004 cases (53.4%) containing complete data for all 12 SNPS was considered.
Linkage disequilibrium (LD) between all pairs of SNPs was quantified assessing the two-loci r 2 coefficient. The statistical significance of the obtained r 2 values was evaluated using the chi-square statistic for LD.
Multiple logistic regression models were used to assess the combined effect of single SNPs on CKD risk. All possible models containing any combination of the single SNPs found to be statistically significant, were fitted considering the inheritance setting determined in the single association analysis. Using a step-wise algorithm and the model with the lowest AIC was chosen. The total number of fitted models was m 1 + m the number of selected SNPs in the single association analysis. Once an initial set of associated SNPs was determined, interaction effects were considered, both between SNPs and between SNPs and CKD risk factors (diabetes and hypertension). A step-wise algorithm was again used to determine the final model including both SNPs and clinical risk factors, calculating the corresponding p-values, OR and 95% confidence intervals.
Potentially confounding variables were also included to calculate adjusted p-values.
Finally, sensitivity and specificity analyses were implemented to compare the model with the SNPs, the model with diabetes and hypertension and the model with all the variables, calculating the receiver operating characteristic (ROC) curves and the area under the curve (AUC). A bootstrap procedure, using 1,000 simulations, was used to obtain a robust estimation of the AUC and a confidence interval was computed using 2.5 and 97.5% percentiles of the bootstrap distribution. DeLong tests were used to compare the curves among models and an optimal threshold for the predicted probability obtained from the logistic multivariate model was determined and, thus, sensitivity and specificity with the corresponding 95% confidence intervals were calculated.
All statistical analyses were performed using R software (R Core Team, 2014), particularly using the packages SNPassoc, Genetics and haplo.stats. Threshold for statistical significance was set at α = 0.05.

Heterogeneity Analysis in Case-Control Groups
Main clinical and demographic data are summarized in Table 1. The CKD group of patients showed higher percentage of hypertensives (91% in CKD; 35.4% in controls), and diabetics (25.4% in CKD; 10.7% in controls). Moreover, the CKD group also included a higher percentage of males, smokers and of non-caucasians. Diabetes and hypertension are well known causes of CKD and were included in the model as potentially confounding variables. Sex, race (Caucasian/non-Caucasian) and age were also included as adjusting variables. Table S3). Thirteen SNPs showed a 100% missing genotypes in controls and were removed from the analysis (Supplementary Figure S1 and Supplementary Table S2). Thus, only 66 out of 79 initially selected SNPs were analyzed for its association to CKD. Then, we tested different inheritance models for all the SNPs, and univariate analyses to assess its association with CKD under the selected inheritance model (Supplementary Table S3). After adjusting for confounding variables, only twelve SNPs showed significant association to CKD ( Table 2). Complete genetic information about the SNPs is provided in  Qualitative data are presented as frequency (%). Fisher test is used to assess differences in the proportions between controls and CKD groups. Quantitative data are presented as mean value (standard deviation). Mann-Whitney's U-test is used to assess differences in the means between groups. SNPs, associated gene, chosen inheritance model and its p-value in the LRT adjusted by confounding variables are shown. Odds-ratio and its 95% confidence interval and the AIC computed from the unadjusted model are also displayed. Finally, permutation p-value is also included.

Supplementary
Frontiers in Genetics | www.frontiersin.org Out of the twelve selected SNPs, three pairs were in LD: rs679620 and rs35068180 (located in the MMP3 gene, chromosome 11), rs9138 and rs1126616 (located in SPP1 gene, chromosome 4), and rs3102735 and rs1564858 (located in TNFRSF11B gene, chromosome 8). These SNPs, together with rs2238135 and rs4236 (located in the chromosome 12 in the VDR and MGP genes, respectively), were considered as candidates to assess the existence of haplotypes. Supplementary Figure  S2 shows the existence of haplotypes for the three pairs of SNPs in linkage disequilibrium, but not for the pair of SNPs of chromosome 12.

Association of Combination of SNPs With CKD
To implement the model fittings, only those subjects with complete data in all of the twelve selected SNPs were considered (n = 1,585; 1,047 cases and 538 controls). Three of the 4,095 multivariate fitted models attained the minimum AIC value (2006.6; Supplementary Figure S3). This was because, under the reference and CKD risk genotype categories, rs1126616 and rs9138 were perfectly correlated, and models with one of each or both were equally considered. Arbitrarily, we selected the model that contains the first SNP (rs1126616). The multivariate model consisted in the combination of eight SNPs (rs1126616, rs35068180, rs2238135, rs1800247, rs385564, rs4236, rs2248359, and rs1564858) associated with CKD (p < 0.05) whose estimated coefficients, standard errors, unadjusted and adjusted p-values and odds-ratios are shown in Supplementary Table S6. We also generated a model using classical risk factors, adjusting by sex, race and age, and, as expected, hypertension and diabetes were found significantly associated to the prevalence of CKD (Supplementary Table S7).
In order to evaluate the possible clinical utility of the proposed model, we generated a multivariate model on which we included the eight SNPs, hypertension, diabetes and the adjusting variables (sex, age, and race). For both, adjusted and unadjusted models, the combination of the 8 SNP was statistically significant for CKD risk prediction, even when classical risk factors and adjusting variables were considered (Supplementary Table S8). Going further, we generated a model considering interactions of the eight SNPs with diabetes and with hypertension, which allowed us a backward elimination of three SNPs (rs2238135, rs385564, and rs1564858). On this process we identified an interaction of hypertension with rs2248359, whose genotype had no effects on CKD risk for non-hypertensive individuals, but for hypertensive patients the TT genotype increased the risk of CKD [p = 0.017, OR = 2 (1.23,3.32); Supplementary Figure S4].
Therefore, a multivariate model containing five SNPs, diabetes, hypertension and the interaction of hypertension with rs2248359 was generated ( Table 3). The predicted risk model is summarized in Figure 1. Thus, our model shows that in a hypothetical high-risk profile patient (male, non-Caucasian with diabetes and hypertension), the presence of the five described SNPs increases the predicted risk of being a CKD patient six times. In any case, this is a predictive tool that needs to be validated in a different cohort. Although the proportion of non-caucasian patients was low (3.61% in the complete model), we assessed the potential interactions of the significant SNPs with race. We did not find any significant interaction (results not shown). In addition, we also fitted the model with only Caucasian patients and we did not find any notable differences in the estimated parameters of the model (results not shown).
Finally, in order to evaluate the CKD predictive power of SNPs in general population, Receiver Operating Characteristic (ROC) curves for the eight SNPs model, the diabetes + hypertension Complete summary of the model containing classical risk factors, SNPs and the adjusting variables. β (SE) denotes the estimated coefficient (standard error) in the logistic regression model, p-value denotes the z-test p-value for the coefficient, OR (95%) denotes the odds-ratio (95% confidence interval).  Summary of the ROC curves for each model. Probability threshold is the chosen probability threshold obtained from the optimal cut-off for each curve, chosen as the point that maximizes the distance to the diagonal line. Sensitivity and specificity, positive and negative predicted values (PV) for that optimal threshold are displayed with 95% confidence intervals. Finally, the respective area under the curve (AUC), with 95% confidence intervals, is included. FIGURE 1 | Prediction models of CKD risk. The height of each bar corresponds to the predicted odds-ratio for that category when compared to the reference category. The value of these odds-ratio, their 95% confidence interval and their p-value are also displayed inside each bar. Above bars, odds-ratios, 95% confidence intervals and p-values comparing each possible other pair of categories are also displayed, always taking the group with less CKD risk as reference. The vertical axis is plotted in logarithmic scale. For all models, taken age was the sample median age (60).
FIGURE 2 | Receiver operating characteristic (ROC) curves of the different multivariate models explored for CKD risk prediction. ROC curves corresponding to explored multivariate models, including in all cases the adjusting variables: sex, race (Caucasian/Non-Caucasian) and age. The "SNPs" ROC curve (orange) corresponds to the model containing the 8-SNP combination and the adjusting variables. The "classical risk factors" ROC curve (green) corresponds to the model containing diabetes, hypertension and the adjusting variables. The ROC "classical risk factors and SNPs" (blue) corresponds to the model containing the 5-SNP combination, diabetes, hypertension, the interaction of hypertension with rs2248359 and the adjusting variables. For each curve, the filled diamond represents the optimal cut-off, chosen as the point that maximizes the distance to the diagonal line (dashed black line).
model and the five SNPs + diabetes + hypertension model, were calculated (Figure 2). The predictive power of the model with the five SNPs + diabetes + hypertension showed the best AUC, and this AUC increase, compared to the one of the classical risk factors model, was statistically significant (p < 0.0001; Table 4). Results of the bootstrap simulations confirmed the predictive power of the model, leading to a mean AUC of 0.823 (0.79, 0.84).

DISCUSSION
In the present study, 79 SNPs were genotyped in a large cohort of CKD patients and non-CKD subjects. After adjusting for age, sex and race, we found twelve SNPs associated to CKD, in which three pairs showed LD. We also account for additive effects among the twelve selected SNPs, finding an eight SNP combination whose additive effects were statistically significant, even when adjusting by potentially confounding variables. We also showed that the model including diabetes, hypertension, sex, age and race, and the five selected SNPs, displays a statistically significant increase in the AUC, compared with the same model without SNPs.
SPP1 codes for Osteopontin (OPN), a protein constitutively expressed in kidney (Hudkins et al., 1999) whose levels are increased in advanced stages of CKD (Lorenzen et al., 2010) and in diabetic nephropathy, where it regulates macrophage recruitment (Kelly et al., 2002). More recently, its levels have been linked to nephritis in lupus patients (Salimi et al., 2016), and OPN knock out mice showed resistance against high phosphate induced nephrocalcinosis (Paloian et al., 2016), a condition that can lead to CKD (Shavit et al., 2015). Despite these data linking OPN to CKD onset and progression, currently there are no studies defining the effect of the rs1126616 SNP on OPN expression or function.
The rs35068180 located in the gene that codes for the matrix metalloprotease 3 (also known as Stromelysin 1) has been related to diabetic nephropathy (Kure et al., 2011), and its combination with the rs1799750 located in the MMP1 gene strongly associates to end stage renal disease (Cozzolino et al., 2009). Indeed matrix metalloproteases are key factors in kidney physiology and its pathologies, as MMP inhibition improves lesions in glomerular disease, diabetic nephropathy and interstitial fibrosis (Tan and Liu, 2012).
BGLAP and MGP genes code for Ostecalcin and Matrix Gla Protein, respectively, proteins whose levels vary with CKD progression (Delmas et al., 1983;Thamratnopkoon et al., 2017). The rs4236 SNP in MGP gene has been associated with aortic calcification in Spanish men and functional studies have demonstrated that the protein expressing the minor allele is less effective in inhibiting calcium deposition in vascular smooth muscle cells (Tuñón-Le Poultel et al., 2014). The rs1800247 has been previously related to Osteocalcin levels in women (McGuigan et al., 2010;Ling et al., 2016), although we did not find any effect of this SNP on Osteocalcin levels in our population (data not shown).
CYP24A1, is a key enzyme of vitamin D metabolism, and has been extensively involved in CKD (Petkovich and Jones, 2011). In GWAS analyses, different SNPs of the CYP24A1 gene have been related to vitamin D levels (Manousaki et al., 2017;Jiang et al., 2018), to calcium levels (O'Seaghdha et al., 2013), and even to glomerular filtration rate (Mahajan et al., 2016;Pattaro et al., 2016), but to the best of our knowledge, this is the first time that the rs2248359 polymorphism has been associated to CKD.
It is worth mentioning that previous GWAS analyses for genes involved in CKD (Köttgen et al., 2009(Köttgen et al., , 2010 did not find any association of the SNPs described here; that could reflect different frequencies of the polymorphisms among the United States based cohorts and our Spain based cohort. Validation of our results in another Spanish cohort would strengthen our results and, due to the changing dynamics of Spanish population, would be desirable the inclusion of non-caucasic genetic background to better replicate the Spanish multicenter NEFRONA cohort. The fact that a small amount of patients presented the five SNPs in our cohort also limits the validation of our model. It is also remarkable that our model needs a minimum of five SNPs to improve clinical parameters prediction; a costeffectiveness analysis would be desirable to test feasibility of clinical implantation of our results. A second limitation is that only 1603 patients presented complete data for all the genotypes, so the final multivariate model was fitted only in those. Despite these limitations, the strength of this study is that we analyzed seventy-nine SNPs in relatively large cohort of CKD patients, and after adjusting for potential confounding variables, we found 5 SNPs related to CKD-MBD that might aid in determining CKD risk.

ETHICS STATEMENT
The protocol of the study was approved by the ethics committee of the Hospital Universitari Arnau de Vilanova (Lleida) and all patients were included after signing informed consent. This research followed the principles of the Declaration of Helsinki.

AUTHOR CONTRIBUTIONS
JV and CP-G performed statistical analyses. SC wrote the manuscript. JMV, EF, IR, JV, and SC conceptualized the design of the study. EF, MB-L, MB, and ÀB provided valuable feedback for manuscript writing and experimental design. All authors contributed to manuscript revision, read and approved the submitted version.

ACKNOWLEDGMENTS
The authors would like to thank the NEFRONA team (Eva Castro, Virtudes María, Teresa Molí, Teresa Vidal, Meritxell Soria) and the Biobank of REDinREN for their invaluable support. The NEFRONA study investigator group is composed by the following: Aladrén Regidor,