DNA Methylation Associated With Diabetic Kidney Disease in Blood-Derived DNA

A subset of individuals with type 1 diabetes will develop diabetic kidney disease (DKD). DKD is heritable and large-scale genome-wide association studies have begun to identify genetic factors that influence DKD. Complementary to genetic factors, we know that a person’s epigenetic profile is also altered with DKD. This study reports analysis of DNA methylation, a major epigenetic feature, evaluating methylome-wide loci for association with DKD. Unique features (n = 485,577; 482,421 CpG probes) were evaluated in blood-derived DNA from carefully phenotyped White European individuals diagnosed with type 1 diabetes with (cases) or without (controls) DKD (n = 677 samples). Explicitly, 150 cases were compared to 100 controls using the 450K array, with subsequent analysis using data previously generated for a further 96 cases and 96 controls on the 27K array, and de novo methylation data generated for replication in 139 cases and 96 controls. Following stringent quality control, raw data were quantile normalized and beta values calculated to reflect the methylation status at each site. The difference in methylation status was evaluated between cases and controls; resultant P-values for array-based data were adjusted for multiple testing. Genes with significantly increased (hypermethylated) and/or decreased (hypomethylated) levels of DNA methylation were considered for biological relevance by functional enrichment analysis using KEGG pathways. Twenty-two loci demonstrated statistically significant fold changes associated with DKD and additional support for these associated loci was sought using independent samples derived from patients recruited with similar inclusion criteria. Markers associated with CCNL1 and ZNF187 genes are supported as differentially regulated loci (P < 10–8), with evidence also presented for AFF3, which has been identified from a meta-analysis and subsequent replication of genome-wide association studies. Further supporting evidence for differential gene expression in CCNL1 and ZNF187 is presented from kidney biopsy and blood-derived RNA in people with and without kidney disease from NephroSeq. Evidence confirming that methylation sites influence the development of DKD may aid risk prediction tools and stimulate research to identify epigenomic therapies which might be clinically useful for this disease.


INTRODUCTION
Diabetes and associated complications are major personal and public health concerns, with diabetic kidney disease (DKD) contributing a substantial financial burden to healthcare providers (Franciosi et al., 2013;Campbell et al., 2017;Arredondo et al., 2018;Disease et al., 2018;Kawaguchi et al., 2020). DKD develops in approximately one-third of individuals with diabetes and remains the most common primary diagnosis of chronic kidney disease (CKD) leading to end-stage kidney disease (ESKD) worldwide (Hill et al., 2014;US Renal and Data System., 2018;Lassalle et al., 2019;UK Renal Registry, 2019). Current treatments are based on modification of risk factors and include the reduction in elevated blood pressure, hyperglycemia and hyperlipidemia. Epidemiological evidence confirms that heritable factors play a major role in the development and progression of DKD, but despite the identification of several genetic loci associated with DKD most of the inherited risk factors remain unknown (Sandholm et al., 2012(Sandholm et al., , 2013(Sandholm et al., , 2014(Sandholm et al., , 2017Canadas-Garre et al., 2018, 2019van Zuydam et al., 2018;Fu et al., 2019;Salem et al., 2019).
Differential DNA methylation has been associated with "metabolic memory" of glycemic control and with a higher risk of developing DKD (Swan et al., 2015;Keating et al., 2018;Aranyi and Susztak, 2019;Gluck et al., 2019;Gu, 2019;Jia et al., 2019;Park et al., 2019). As DKD is not clinically detectable until significant organ damage has developed (albuminuria and/or reduced eGFR), more effective diagnostic tools and treatments, guided by a better understanding of pathophysiology, are urgently required. The identification of novel epigenetic risk markers and biological pathways influencing DKD would contribute more to the understanding of this serious diabetic complication. This manuscript describes an EWAS from carefully phenotyped individuals who were specifically recruited to investigate molecular risk factors for DKD in people with type 1 diabetes, followed by in silico and de novo replication. Our overarching aim is to identify differentially methylated CpG sites associated with DKD.

Participants
All recruited individuals provided written, informed consent and this study was approved by a United Kingdom Multicentre Research Ethics Committee (MREC/98/6/71). The discovery group comprised a subset of individuals (150 cases compared to 100 controls) selected from an established United Kingdom casecontrol collection that was recruited specifically to investigate risk factors for DKD (McKnight et al., 2010). One-third of the case group had end stage renal disease (ESRD). All individuals were White, from the United Kingdom and were diagnosed with type 1 diabetes prior to the age of 31 years. Participants in the case group had persistent proteinuria (>0.5 g protein/24 h) at least 10 years after diagnosis of diabetes, hypertension (BP > 135/85 mmHg or treatment with antihypertensive agents) and diabetic retinopathy. Individuals in the control group had at least 15 years duration of type 1 diabetes with normal renal function and were not receiving antihypertensive treatment. Cases and controls in the discovery group were matched for age, gender and duration of diabetes. The in silico replication groups comprised independent samples from the remainder of this collection and had similar characteristics to those involved in the discovery group. Similarly, all individuals included in the de novo typing phase were selected using blood-derived DNA from the larger collection (Table 1).

Methylation Typing
482,421 unique CpG features were evaluated in the discovery group. Existing blood-derived DNA [extracted using the saltingout method as previously described (Bell et al., 2010)] was accurately quantitated using PicoGreen R , normalized, and bisulfite treated using the EZ-96 DNA Methylation-Gold TM Kit (Zymo Research, Irvine, CA, United States) with case and control samples randomly distributed across plates. The Infinium Human Methylation 450K BeadChip (Sandoval et al., 2011) (Illumina Inc., San Diego, CA, United States) was employed according to manufacturer's instructions. Raw data were adjusted for dye bias and quantile normalized at the probe level with data derived from sites using Infinium I or Infinium II assay chemistry considered separately. This high-throughput platform enables quantitative evaluation of methylation levels with single nucleotide resolution, generating a methylation score per individual (a β value ranging from 0 for unmethylated to 1 representing complete methylation) for each CpG site.
In silico support for candidate loci was sought using normalized data available at the Gene Expression Omnibus accession GSE20067 (Bell et al., 2010). We previously analyzed 27,578 CpGs based on data from the Illumina Infinium R HumanMethylation27 BeadChip in 192 individuals diagnosed as having type 1 diabetes with and without nephropathy using time to event (duration of diabetes until nephropathy) analysis (Bell et al., 2010). De novo replication was performed using Sequenom Epityper assays (Sequenom Inc., Hamburg, Germany). Sequenom facilitates the quantitative analysis of DNA methylation using matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Assays were designed using the default settings (except mass window range changed to 1500-8000) at www.epidesigner.com. Amplicons were carefully designed to cover target sites, results were generated following the manufacturer's protocol and data analyzed using Epityper viewer 1.2 (Sequenom). An additional 235 independent individuals were typed as part of the replication population for de novo confirmation with the threshold for significance set at P < 0.05 without adjustment for multiple testing ( Table 1).

Analysis
For the discovery cohort, stringent quality control included evaluation of bisulfite conversion efficiency, staining, hybridization, target removal, extension, dye specificity and 600 integral negative controls. Samples were excluded where more than 10% of probes did not generate useful data and all sites with poor detection P-values (detection P ≥ 0.05) were set to "missing." Known non-CpG targeting probes (n = 3,091) were excluded from all results (Chen et al., 2013;Zhou et al., 2017). Probes on autosomes were evaluated for association with DKD adjusted for sex, duration of diabetes, age at diagnosis, and HBA 1c with subgroup analysis performed for cases with ESRD. Sex-specific analysis was performed for probes on chromosome X. Converting intensity levels to beta values and initial preprocessing were performed using the default settings within GenomeStudio's methylation module v1.9 (Illumina) (Smyth et al., 2014b). Principal component analysis and multi-dimensional scaling were employed and potential outliers from gender, non-White ethnicity or experimental batch effects were excluded from further investigation. Proportional white cell counts from whole blood (B cells, granulocytes, monocytes, NK cells, and T cells subsets) were estimated using Houseman's and Reinius' approaches (Houseman et al., 2012;Reinius et al., 2012). Microarray quality control metrics reports were generated using the arrayQualityMetrics package with the recommended parameters in Bioconductor (Kauffmann and Huber, 2010); arrays that did not pass the default quality control thresholds were excluded from further analysis. After correction for dye bias, raw data were normalized using quantile normalization using methylumi 1 . The Bioconductor package Limma (Wettenhall and Smyth, 2004;Ritchie et al., 2015) was used to generate association results. Significance values for the 450K array-based analysis were adjusted for multiple testing using the Benjamini & Hochberg, "fdr" adjustment, implemented in limma and reported as adjusted p-values.
For in silico replication, previously generated association data from the Illumina Infinium R HumanMethylation27 BeadChip for 192 individuals diagnosed as having type 1 diabetes with and without kidney disease (Bell et al., 2010) was "looked up." For de novo replication, new genotype data was generated using Sequenom's MassARRAY R System. Sequenom-based data were analyzed by the large samples z test statistic with regression analysis to adjust for sex, duration of diabetes, age at diagnosis, and HBA 1c. The area under the receiver operating characteristic (ROC) curve was generated using SPSS (version 15) to assess the ability of a CpG to distinguish between cases and controls.
In silico functional support for replicated loci was sought from datasets within NephroSeq (last accessed 14th September 2020) 2 for publicly available gene expression data using a p-value threshold of 0.05 and a fold change of at least 1.5.
To investigate if SNPs from our GWAS for DKD were located near differentially methylated CpG probes, SNPs in key chromosomal locations (5 kb flanking top ranked markers or their associated genes) were evaluated for association with DKD using publicly available data from our genome-wide association study (dbGAP phs000389.v1.p1); for SNP-based association analysis of this GWAS data, P-values were corrected by genomic control and adjusted for age at diagnosis, duration of diabetes, gender, biochemistry center and the first ten components of the study specific principal component analysis (Sandholm et al., 2012).
Provisionally significant genes (adjusted P < 0.0001) were analyzed for enrichment of KEGG pathway membership; enrichment was assessed for genes that showed increased or decreased methylation values separately. Additionally, DAVID Bioinformatics Resources 6.7 was interrogated for top-ranked gene results in the genetic association database, Online Mendelian Inheritance in Man, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and articles recorded in PubMed (Huang da et al., 2009).

RESULTS
Greater than 99% concordance was observed between duplicate samples and experimentally defined genders matched each individual submitted for analysis based on Y-chromosome specific loci. Experimental controls generated expected results, but 5 arrays were identified as outliers and were thus removed from all subsequent analyses. There was no significant difference in the estimated proportional white cell counts between case and control groups (Supplementary Table 1). In silico adjustment for white cell composition does not alter the top-ranked genes associated with DKD in this study. As the discovery collection included 250 individuals, only CpGs with a sizable, significant difference in DNA methylation ( β) were considered for subsequent analysis (Figure 1). Twenty-two unique sites were identified with β ≥ 0.2 and P < 10 −8 where significance values were adjusted for multiple testing using the Benjamini & Hochberg method for controlling the false discovery rate (FDR). These sites are primarily "promoter associated" and affect 22 genes ( Table 2); all 22 sites were taken forward for in silico and wet-lab replication. Subgroup analysis comparing those individuals who received a kidney transplant for DKD with controls show that 20 of these 22 sites were also highlighted in the ESRD focused dataset ( Table 2). Sex-specific analyses for cg26399113 on the X chromosome was significant from female only (P = 1 × 10 −12 ) and male only (P = 8.9 × 10 −11 ) analyses. The discovery group generated results for 15 CpGs in the CCNL1 gene, of which 12 showed an increase in methylation between cases and controls. Similarly for ZNF187, nine CpGs were examined of which eight showed differential methylation and all were in the same direction as the index marker.
Several SNPs from our British -Irish genome-wide association study revealed nominal significance for SNPs in the region of interest from the 450K discovery analysis. The SNP demonstrating most evidence for association with DKD within 5 kb flanking top-ranked methylation probes are presented in Supplementary Table 3. Rs16888186 showed the most evidence for association in the DST gene with P = 0.0008.
In silico replication data were available for six probes on the 27K array (cg25418748, cg07979357, cg21935083, cg21829265, cg24115040, cg26399113), but none of these sites confirmed differential methylation levels between cases and controls. De novo replication gave z test P-values supporting the association of two loci: cg02399570 on 3q25.31 in the CCNL1 gene, P = 7.2 × 10 −14 and cg04861640 on 6p21.31 in the ZNF187 gene, P = 1.5 × 10 −15 (Figure 2). Selecting a β level of 25% (as standard for unmethylated loci), the area under the ROC curve was 0.74 for cg02399570 and 0.81 for cg04861640 (Figure 2).
Multiple significant analyses for CCNL1 gene expression associated with kidney function were returned from NephroSeq (Supplementary Figure 1). Higgins et al. (2004) observed that CCNL1 is more highly expressed in renal glomeruli than other FIGURE 1 | Manhattan plot showing distribution of delta beta values for uniquely mapped sites across all chromosomes. The blue line discriminates sites that have suggestive differences in methylation between cases with nephropathy compared to non-nephropathic individuals in the control group. Circles above the red line are loci where a substantial difference in beta values (>0.2) were observed and the two markers supported in the replication group are highlighted in green. Markers that are not uniquely mapped to a chromosome position based on Illumina's updated bead pool manifest are assigned to chromosome 1 in this figure.
Frontiers in Cell and Developmental Biology | www.frontiersin.org TABLE 2 | Details for markers showing significant (FDR adjusted P-value), substantial differences in mean methylation ( β) between case and control groups in the discovery phase.

Locus
Sequence position  control participants, observing only a small change in expression for CCNL1 (P = 0.02). Multiple significant analyses were returned from NephroSeq ZNF187 kidney gene expression data from dissected renal lobes of five adult human kidneys using cDNA microarrays representing ∼30,000 different human genes (Higgins et al., 2004). ZNF187 is more highly expressed in renal glomeruli than other renal tissues (Higgins et al., 2004;Lindenmeyer et al., 2010). The most significant result returned from searching NephroSeq for ZNF187 was for tissue type [P = 1.7 × 10 −5 comparing tubulointerstitium to glomerular tissue in six transplant living donors (Lindenmeyer et al., 2010)] and acute rejection following kidney transplantation from 48 patients [P = 2.8 × 10 −5 (Sarwal et al., 2003)]. Gene expression changes were associated with glomerular filtration rate in kidney biopsy samples from people with IgA nephropathy (P = 6.27 × 10 −4 ) (Reich et al., 2010), diabetes (P = 0.008) (Woroniecka et al., 2011), and blood derived gene expression from healthy individuals who had no evidence of kidney disease (P = 0.018) (Flechner et al., 2004). Focusing in on available data from 22 racially diverse microdissected human kidney samples with type 2 DKD measured on the Affymetrix U133A 2.0 array, visualizes decreased renal function associated with a decrease in ZNF187 gene expression (P = 0.008) (Woroniecka et al., 2011).

DISCUSSION
Using the Infinium Human Methylation 450K BeadChip (Sandoval et al., 2011) we identified differentially methylated CpG sites associated with DKD. To minimize false positive associations in our discovery cohort, we applied stringent quality control and adjusted association analyses; we used both a genome-wide significance threshold and a clear absolute methylation difference ( β ≥ 0.2) to minimize artifactual associations and ensure the selection of differentially methylated CpG probes (Dedeurwaerder et al., 2014). No SNPs are reported to affect methylation probes for ZNF187, however, one SNP resides 40 bases from the 3 end of the probe for the CCNL1 gene. This SNP (rs75624594, NP_064703.1:p.His112 = His, synonymous coding) has a reported minor allele frequency of 50% in African individuals, but only 4% in the US NHLBI Exome Sequencing Project (dbSNP ss342150967) 3 so is unlikely to account for the differential methylation observed in this study.
The majority of probes evaluated using the 27K array were also present on the 450K array (n = 25,978), and demonstrate good correlation for many, but not all, CpG probes. For this reasons we sought supporting data from a previous study exploring DKD using the 27K array. It is clearly only possible to seek support for top-ranked markers that were present across both the 27K and 450K arrays and unfortunately data was only available for seven probes, none of which supported an association. Acknowledging the limitations of the 27K array to support top-ranked markers in this study (primarily that the majority of our top-ranked markers were not present on that much smaller array) and in the absence of a replication cohort with 450K data available for a population with similar phenotype characteristics, we sought independent replication. Pragmatically, independent replication was sought using all available samples with high molecular weight DNA (139 cases and 96 controls) for the 22 loci that were significantly associated with DKD from the 450K arraybased discovery analysis using β ≥ 0.2 and FDR adjusted P < 10 −8 thresholds. Importantly, the replication population was recruited with similar phenotypic characteristics to that of the discovery cohort and we used a completely different wetlab approach (mass spectrometry) to validate the microarraybased data, which minimizes artifacts due to the microarray analysis. Only two of these CpG sites were supported by the new methylation data generated in our replication populationthis is not unusual in genome-wide studies and may be due to false positives in the original EWAS or due to the fact that our replication population was not sufficiently powered to identify significant associations with all loci. Both discovery and de novo replication using a different experimental platform (mass spectrometry by Sequenom analyzed using EpiTyper software) to generate new laboratory data and an independent population strongly support the association of CCNL1 and ZNF187 genes with methylome-wide significance from 450K array-based (microarray by Illumina's iScan) discovery association results. RNA-based gene expression data also supports a functional influence of CCNL1 and ZNF187 in kidney disease.
Methylation patterns for CCNL1 and ZNF187 genes showed a striking higher methylation level for controls compared to individuals in the case group (Figure 2). It should be noted that DNA samples available for replication were not age and gender matched in this study, rather they were pragmatically selected as all available samples with high quality DNA and careful phenotyping. Cases were older, diagnosed with type 1 diabetes later, and had higher HBA 1c values with a longer duration of diabetes than individuals in the control group. Differential methylation between cases and controls was significant in both the CCNL1 and ZNF187 genes following adjustment for age, duration of diabetes, HBA1c and gender.
The CCNL1 gene encodes cyclin L1, which is localized in nuclear speckles (splicing factor storage compartment) (Herrmann et al., 2007), is functionally related to the spliceosome, and is involved in pre-mRNA splicing activities (Chen et al., 2007;Tannukit et al., 2008). The transcription start site for CCNL1 is located 89 bp upstream of the initiation codon and the first two exons overlap the CpG island (Dickinson et al., 2002; Figure 2). CCNL1 was consistently hypomethylated in cases compared to controls. Overexpression (usually associated with hypomethylation) of CCNL1 has been associated with cancer (Sticht et al., 2005;Mitra et al., 2010;Peng et al., 2011).
The discovery group generated results for 15 CpGs in the CCNL1 gene, of which 12 showed an increase in methylation between cases and controls. Replication using independent samples and a different wet-lab experimental approach supported the association of CCNL1 with DKD. Based on published gene expression data from the Affymetrix R GeneChip R Whole Transcript Expression Arrays, CCNL1 was one of four genes differentially expressed in patients with kidney stones compared to controls (2.6 fold change, downregulated, P = 6.58E-05) (Liang et al., 2019). Based on gene expression data within NephroSeq, differential gene expression was observed for CCNL1 in kidney biopsy tissues from people with kidney disease compared to controls (Gunther et al., 2014;Ju et al., 2015;Nakagawa et al., 2015); no adjustment was made for cell heterogeneity in the disease compared to control collections for these gene expression datasets. CCNL1 resides in chromosome band 3q25, which has been previously suggested to harbor risk loci for DKD (McKnight et al., 2009). Of particular interest, the CCNL1 gene was ranked 4th from a meta-analysis for association with severe diabetic retinopathy (P = 7.1 × 10 −7 ), but was no longer topranked for association with diabetic retinopathy when individuals with nephropathy were removed from the case group (Grassi et al., 2011). Subsequent studies have highlighted CCNL1 SNPs associated with retinopathy and measures of renal function (Lin et al., 2016). Genetic variation near the CCNL1 gene is robustly associated with low birth weight in European individuals (Freathy et al., 2010;Yaghootkar and Freathy, 2012;Horikoshi et al., 2013), specifically with growth restriction from early pregnancy onward (Mook-Kanamori et al., 2011). Another study suggests that individuals who carry a risk allele for rs900400 (near CCNL1) are more vulnerable to stress impacting on birth weight (Ali Khan et al., 2012). The relationship between birth weight and kidney disease has been debated with some groups suggesting that low birth weight is a risk factor for DKD (Rossing et al., 1995) while others report that low birth weight does not increase the risk of DKD (Fagerudd et al., 2006). The birth weight lowering effect rs900400 C allele has also been associated with increased insulin response following oral glucose stimulation in a meta-analysis based on Danish and Finnish non-diabetic individuals (Andersson et al., 2011). Published literature suggests that CCNL1 may affect an individual's inherited and dynamic responses to their environment, perhaps reflecting both genetic and epigenetic contributions. While the molecular mechanism for CCNL1 influencing DKD remains to be resolved, this is clearly a candidate gene that warrants further investigation having demonstrated genetic (SNPs), epigenetic (methylation) and transcriptomic (gene expression) associations with kidney disease across multiple collections.
In the discovery cohort, nine CpGs were examined for ZNF187; eight of these showed differential methylation and all were in the same direction as the index marker. Replication demonstrated strong support for association of ZNF187 with DKD. Analysis within NephroSeq revealed gene expression changes associated with glomerular filtration rate in kidney biopsy and blood-derived samples from people with IgA nephropathy (Reich et al., 2010), diabetes (Woroniecka et al., 2011), and renal function in healthy individuals (Flechner et al., 2004 ; Supplementary Figure 1). ZNF187 is involved with transcriptional regulation, but there are few publications describing this gene. The ZNF187 gene is was located at 6p21.31 and encodes the zinc finger protein 187. Gene ontology suggests that ZNF187 is involved with transcriptional regulation, but there are no specific publications for this gene (or any aliases) in PubMed (accessed 12/05/20) 4 . The protein coding ZNF187 gene has been renamed as ZSCAN26 (zinc finger and SCAN domain containing 26), but we have retained the ZNF187 nomenclature throughout this manuscript to keep the methylation array information consistent and facilitate easier searching and replication within methylation and gene expression datasets.
Both CCNL1 and ZNF187 were more highly expressed in renal glomeruli than other renal tissues, which may be consistent with DKD in people with type 1 diabetes primarily affecting the glomerulus (Higgins et al., 2004). The results for both significantly replicated genes in our methylation dataset are hypomethylated in cases compared to controls, while publicly available gene expression evidence for both genes suggests there is less RNA product in samples from people with kidney disease. This may be because less gene is expressed, or because the mRNA measured on arrays is not present for long, or because different isoforms are not captured by the gene expression array. cg02399570 is in the body of CCNL1 while cg04861640 is within the transcription start site of ZNF187. Most promoters and CpG sites in gene bodies that are hypomethylated are basically expressed, but gene regulation is complex and genes may be hypomethylated and "overexpressed" in disease states (Haney et al., 2016;Li et al., 2017). Significant further research is required in populations that have blood derived, kidney biopsy derived and in vitro models with SNP, CpG, and gene expression data available for the same individuals to tease out the molecular signatures of these genes (CCNL1 and ZNF187) for kidney disease.
Pathway analysis revealed significant gene enrichment in the focal adhesion, Wnt and MAPK signaling pathways. These pathways have been previously top-ranked as differentially regulated in renal tubuli of individuals with DKD compared to healthy tissue from living kidney donors (Woroniecka et al., 2011). The focal adhesion pathway also demonstrates enrichment in the glomeruli of individuals with DKD (Woroniecka et al., 2011) and is a key molecular pathway in the formation and progression of the cardiorenal system (Muhlberger et al., 2012). The Wnt signally pathway has been shown to influence survival of glomerular mesangial cells exposed to high glucose (Lin et al., 2006) (41) and dysregulation of the Wnt pathway may represent and important pathogenic mechanism of DKD (Kavanagh et al., 2011;Zhou et al., 2012).
We incorporated existing GWAS data (Sandholm et al., 2012) with this novel methylation data to provide exploratory analysis seeking a provisional assessment of functionality. i.e., are SNPs demonstrating suggestive association with DKD from GWAS near CpG probes that are differentially methylated 4 www.ncbi.nlm.nih.gov/pubmed using the same case-control study population. As SNPs may have a functional role for each gene at a considerable genetic distance, we pragmatically selected 5 kb flanking each probe for this analysis. No strong associations were identified for SNPs near top-ranked differentially methylated genes. We also considered a similar analysis by looking up CpG sites for association near top-ranked GWAS SNPs (Sandholm et al., 2012). Meta-analysis of genome-wide association studies for DKD revealed novel association (P = 1.2 × 10 −8 ) with SNPs in the AFF3 gene, a transcriptional activator that influences renal fibrosis through the TGFβ1 pathway, in individuals who had progressed to end stage renal disease (Sandholm et al., 2012). Additionally, the top-ranked marker associated with DKD was in the ERBB4 gene (P = 2.1 × 10 −7 ) (Sandholm et al., 2012). Significant differential methylation was observed at both of these loci when comparing cases and controls on the 450K array (Table 3) suggesting that a combined genetic-epigenetic factor may influence the risk of DKD.
The role of epigenetics in common, complex diseases is beginning to be unraveled at a population level using relatively high throughput tools. Evidence is increasing that interindividual epigenetic variation, in particular DNA methylation, may help explain some of the "missing heritability" that has not been identified through genome-wide association and resequencing approaches. Illumina's 450K BeadChip was proposed as the method of choice for cost-effective, high throughput epigenome-wide association studies with single-base resolution (Rakyan et al., 2011b). The content of this array (485,764 sites distributed across all chromosomes) was selected based on input from 22 methylation experts across the world. Included are unique markers that cover 99% of RefSeq genes with an average of 17 CpG sites per gene region distributed across the promoter, 5 untranslated region, first exon, gene body, and 3 untranslated region. This array also includes dedicated content for CpG sites outside CpG islands and microRNA promoter regions. Illumina have released a higher density EPIC array, the Infinium MethylationEPIC BeadChip, which facilitates evaluation of 862,927 sites at significantly increased financial cost. More comprehensive analysis of the methylome may be conducted through whole-methylome-sequencing, but this is financially prohibitive for most researchers using cohorts of more than 200 participants. However, technological and analytical advances now offer the potential for targeted, high throughput bisulfite sequencing with deep coverage as an attractive option for technical validation and replication. Using high density methylation arrays is currently the most cost-effective approach for EWAS using population-based study designs. Stringent quality control, strong significance values, and independent replication are essential to minimize false positive findings when investigating sequence changes to elucidate the genetic architecture of multifactorial disease.
Although this epigenetic study does not include the large sample numbers traditionally associated with genome-wide association studies, we have applied rigorous analysis approaches throughout and gained support using independent samples by a different technology -important to minimize technical artifacts. A sample size of only 65 heavy smokers and 56 non-smokers was sufficient to identify differential patterns of methylation (P = 2.68 −31 ) associated with smoking using the Illumina Human Methylation 27K BeadChip (Breitling et al., 2011). Similarly, a type 1 diabetes-methylation variation position signature was detected by assaying a relatively modest number of samples (n = 15 monozygotic twin pairs discordant for type 1 diabetes) on the 27K array (Bell et al., 2010;Rakyan et al., 2011a). Associations using the 450K array have been reported from 11 cell lines for rheumatoid arthritis (Nakano et al., 2013), 48 individuals for irritable bowel disease (Harris et al., 2012), and 165 females for alcohol use (Philibert et al., 2012), although study sizes are now increasing. Previous studies exploring blood-derived DNA methylation used an EWAS approach for DKD focused on the 27K array (Sapienza et al., 2011), the mitochondrial genome (Swan et al., 2015) or renal function decline in 181 Pima Indians with diabetes (Qiu et al., 2018). Targeted DNA methylation studies have been conducted using blood-derived DNA (Aldemir et al., 2017;Smyth et al., 2018), cell models of DKD (Brennan et al., 2010;Li et al., 2019) and an EWAS reported on kidney biopsy samples from 91 individuals of whom 45% had diabetes  and 11 individuals with diabetes (Ko et al., 2013). The case and control population employed in this study has >60% power to detect a true positive (defined as detected CpGs with a meaningful difference in mean blood derived DNA methylation ±0.2 with a false discovery rate P ≤ 0.05) association using pwrEWAS (Graw et al., 2019). More comprehensive whole genome bisulfite sequencing has been reported for kidney biopsy samples from five individuals with DKD compared to one person with diabetes without kidney disease, and four people with neither diabetes nor kidney disease . Careful phenotyping is critically important for methylation studies, as is the consistent extraction and storage of DNA. We and others have previously demonstrated that differences in DNA extraction approaches and storage methods significantly alter methylation profiles. Importantly for this study, all DNA was extracted using the same approach in the same laboratory (by two persons) with extracted DNA stored at −80 • C in multiple aliquots with only one freeze-thaw cycle. Individuals were carefully phenotyped by consultant nephrologists using internationally agreed phenotype criteria. We restricted analysis to individuals with type 1 diabetes and known kidney function to minimize phenotypic heterogeneity and used a matched design for the discovery population.
Many cell types have unique methylation profiles so where possible it is important to adjust for cell heterogeneity in all studies using blood-derived or kidney-derived DNA. In our study, there was no significant difference in proportional white cell counts from whole blood (B cells, granulocytes, monocytes, NK cells, and T cells subsets) and adjusting for cell composition does not change the top-ranked association results for this study. While adjusting for white cell subpopulations is critically important for cancer studies, immune-mediated responses, and case-control approaches not matched for age and gender, this is a result that we and others have reported previously for carefully phenotyped populations with stringent wet-lab protocols from blood sampling through to array scanning. The proportion of these cell types may also reflect changing disease pathology (Lappalainen and Greally, 2017;Johnson et al., 2020). Epigenetic signatures may display tissue specificity linked to disease mechanisms, however, obtaining kidney biopsy material is invasive and is not performed as part of routine clinical practice in people with DKD and T1D in the United Kingdom. Peripheral blood-based methylation biomarkers have shown promise in several clinical fields (Moore et al., 2014;Agha et al., 2019;Cardenas et al., 2019;DiTroia et al., 2019;Henderson-Smith et al., 2019;Kerr et al., 2019aKerr et al., ,b, 2020Ladd-Acosta and Fallin, 2019;Zhou et al., 2019) including kidney disease (Smyth et al., 2014b(Smyth et al., , 2018Swan et al., 2015;Aranyi and Susztak, 2019;Gluck et al., 2019;Kato and Natarajan, 2019;Kerr et al., 2019b;Park et al., 2019). We have previously demonstrated that blood-derived differential methylation is also reflected in kidney-derived differential methylation for CKD (Smyth et al., 2014b). Blood-derived DNA methylation offers clinical utility in biomarker development, incorporating a minimally invasive approach that could be cost-effectively implemented in a routine clinical setting. Indeed, ROC curve analysis suggests that 25% methylation for the two key CCNL1 and ZNF187 markers is reasonably good at differentiating individuals in case and control groups. This is particularly critical for translation in DKD given the difficulty obtaining serial renal biopsies to establish diagnosis, track progression, and monitor response (Kim et al., 2018). Array-based approaches using blood-derived DNA have previously identified risk factors and biomarkers associated with complex phenotypes (Sandholm et al., 2013(Sandholm et al., , 2017Keating et al., 2018;Canadas-Garre et al., 2019;Gu, 2019;Park et al., 2019).
Interpreting epigenetic factors as disease-causing or consequences of disease processes, alongside genetic and/or environmental heterogeneity, are a significant problem for complex disease. Nevertheless, this study demonstrates that using high density methylation arrays are an appropriate, costeffective tool to identify differential methylation profiles that may deliver minimally invasive biomarkers that are relevant for diabetic complications. We have identified CCNL1 and ZNF187 as differentially methylated genes associated with DKD in multiple cohorts. Larger EWAS exploring more markers with larger sample sizes will deliver the same gains identifying molecular biomarkers as has been observed for GWAS in recent years. Using longitudinal cohort designs will allow researchers to observe how DNA methylation changes over time. More complex analytical tools are being developed for DNA methylation such as MethylNet (Levy et al., 2020), which offers further opportunities for novel discoveries and improved understanding. The integration of multi-omic profiling will lead to a better understanding of inherited susceptibility to DKD and biomarkers for this common disease.

DATA AVAILABILITY STATEMENT
The data supporting the conclusion of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the South and West Multicentre Research Ethics Committee (MREC/98/6/71). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
AMcK had full access to the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. LS, AM, and AMcK conceived and designed the study. LS and ES performed sample analysis. CP performed the statistical analyses in the replication group. All authors provided important intellectual content and agreed the final version of this manuscript.