In Silico Enhanced Restriction Enzyme Based Methylation Analysis of the Human Glioblastoma Genome Using Agilent 244K CpG Island Microarrays

Genome wide methylation profiling of gliomas is likely to provide important clues to improving treatment outcomes. Restriction enzyme based approaches have been widely utilized for methylation profiling of cancer genomes and will continue to have importance in combination with higher density microarrays. With the availability of the human genome sequence and microarray probe sequences, these approaches can be readily characterized and optimized via in silico modeling. We adapted the previously described HpaII/MspI based Methylation Sensitive Restriction Enzyme (MSRE) assay for use with two-color Agilent 244K CpG island microarrays. In this assay, fragmented genomic DNA is digested in separate reactions with isoschizomeric HpaII (methylation-sensitive) and MspI (methylation-insensitive) restriction enzymes. Using in silico hybridization, we found that genomic fragmentation with BfaI was superior to MseI, providing a maximum effective coverage of 22,362 CpG islands in the human genome. In addition, we confirmed the presence of an internal control group of fragments lacking HpaII/MspI sites which enable separation of methylated and unmethylated fragments. We used this method on genomic DNA isolated from normal brain, U87MG cells, and a glioblastoma patient tumor sample and confirmed selected differentially methylated CpG islands using bisulfite sequencing. Along with additional validation points, we performed a receiver operating characteristics (ROC) analysis to determine the optimal threshold (p ≤ 0.001). Based on this threshold, we identified ∼2,400 CpG islands common to all three samples and 145 CpG islands unique to glioblastoma. These data provide general guidance to individuals seeking to maximize effective coverage using restriction enzyme based methylation profiling approaches.


INTRODUCTION
Cytosine methylation of CpG dinucleotides represents a major type of epigenetic modifi cation (Worm and Guldberg, 2002). Since CpG dinucleotides cluster in genomic regions called CpG islands that frequently overlap with promoter regions, one of the effects of cytosine methylation is the inhibition of gene transcription by enabling interference via methyl-CpG binding proteins (Boyes and Bird, 1992). As a potential mechanism for silencing tumor suppressors or other important genes, aberrant CpG island methylation has gained increasing recognition for its role in cancer (Jones and Baylin, 2002). One of the most prominent examples is the recent discovery that promoter methylation of O 6 -methylguanine-DNA methyltransferase (MGMT) is associated with improved treatment response of glioblastoma multiforme (GBM) patients to the alkylating agent temozolomide (Hegi et al., 2005). Since MGMT is a major repair enzyme for temozolomide-induced DNA damage, the presumed mechanism responsible for the benefi t of promoter methylation is loss of repair activity via gene silencing of MGMT. In addition to MGMT, single CpG island/promoter approaches have identifi ed a number of other In silico enhanced restriction enzyme based methylation analysis of the human glioblastoma genome using Agilent 244K CpG Island microarrays been performed on cancer genomes including osteosarcoma using Me-DIP (Sadikovic et al., 2008), pancreatic adenocarcinoma using methylated CpG island amplifi cation (Omura et al., 2008), prostate, colon, and breast cancer using methylated CpG island amplifi cation (Chung et al., 2008), and lung squamous cell carcinoma using MIRA (Rauch et al., 2008).
Thus far, the most high profi le genome-wide methylation survey of the glioblastoma genome has been led by The Cancer Genome Atlas (TCGA) Research Network (2008). TCGA surveyed CpG dinucleotide methylation in a large group of GBM patient samples using the Illumina GoldenGate Assay Platform to accompany copy number and gene expression microarray analysis. The major limitation of this platform is the relatively limited coverage, with only 1,500 CpG sites independently evaluated, corresponding to approximately 1,200 genes. More recently, data generated from the Illumina HumanMethylation27 platform providing methylation information on 27,000 CpG sites has been released by TCGA. Additional studies that have surveyed methylation within gliomas have used McrBCbased and DMH-based genome wide approaches (Waha et al., 2005;Ordway et al., 2006). Indirect examination of the GBM methylome has been performed using microarrays to track gene expression changes in GBM cell lines after pharmacological treatment with demethylating agents (Foltz et al., 2006;Kim et al., 2006).
Despite the advent of novel approaches using next generation sequencing technology, restriction enzyme-based approaches relying on differential cleavage of recognition sequences containing methylated and unmethylated CpG dinucleotides continue to be widely used for genome-wide methylation detection. The modifi ed MSRE technique used in the present study is a variant of DMH that uses the HpaII/MspI isoschizomer restriction enzyme pair to determine methylation of CpG dinucleotides in the context of CCGG recognition sites (Yan et al., 2002;Adrien et al., 2006). Similar approaches, such as MIAMI and HELP, have also relied on this isoschizomeric pair combination (Hatada et al., 2006;Khulan et al., 2006). The major advantage of this strategy is that comparison to a separate reference (control) sample is not necessary, limiting false positive calls from polymorphic variation at restriction sites and incomplete digestion. Aside from DMH, there are other restriction enzyme approaches that do not utilize the isoschizomeric pair of HpaII/MspI, including the fractionation of genomic DNA by McrBC, an enzyme that cleaves methylated sites (Ordway et al., 2006;Pfi ster et al., 2007). Recently, this approach was used along with a custom-designed microarray called Comprehensive High-Throughput Arrays for Relative Methylation (CHARM) (Irizarry et al., 2008).
In this study, we demonstrate improved genome-wide methylation profi ling by coupling the MSRE approach described by Adrien et al. (2006) with Agilent high-density CpG island arrays, providing coverage of more than 20,000 CpG islands (Adrien et al., 2006). In silico analysis was used to optimize the assay, and our results pr ovide useful strategies for optimization of restriction enzyme approaches with available or custom microarrays.

AGILENT CpG ISLAND MICROARRAY
High-density two-color Human CpG island microarrays (printed using 60-mer SurePrint technology) were purchased from Agilent (G4492A, Agilent Technologies, Santa Clara, CA, USA). Originally designed based on the UCSC genome hg17 but remapped to hg18, each Agilent microarray (Design ID 014791) contains 237,220 probes that tile through each of the ∼27,000 CpG islands (including an extra 95 bp of fl anking sequence on each end) with an average probe spacing of 100 bp. Each probe on the array is identifi ed by its location on the genome and its associated gene(s) based on UCSC annotations (Agilent Probe Design ID 014791).

IN SILICO SIMULATION OF AMPLICON GENERATION AND HYBRIDIZATION
Hybridization to the Agilent CpG island array was simulated in silico based on the most updated Homo sapiens Genome hg18 within the UCSC database (Kuhn et al., 2009) and the probe design of the Agilent CpG island microarray (eArray, design ID 014791). Most of the in silico analysis was performed using CRAN R version 2.7.0 in conjunction with BioConductor 2.3 (Gentleman et al., 2004). The entire human genome hg18 was imported into R through package BSgenome and was 'digested' into fragments based on the location of BfaI restriction sites (C/TAG). Each Agilent CpG island microarray probe was mapped and assigned to a fragment based on its sequence. Probes containing the BfaI recognition site(s) (C/TAG) were determined to be problematic due to possibility of non-specifi c or dual fragment binding and are thus removed from analysis (designated as Partial binding probes in Table 1). The compiled list of probes with their hybridizing fragments, fragment size, and number of internal HpaII/MspI restriction sites were exported to a MySQL database and also into a BED fi le for the UCSC Genome Browser (Kent et al., 2002). Fragments without HpaII/MspI restriction sites (C/CGG) were designated as insensitive fragments and were expected to have identical MspI and HpaII intensities (i.e. log 2 ratio ∼0) (Figure 1). Therefore, probes corresponding to these insensitive fragments provided a large internal control group because they remain unaffected by the degree of methylation.

ACQUISITION AND DERIVATION OF GENOMIC DNA SAMPLES
Both the GBM tissue sample and normal human brain sample had been collected and archived through an IRB approved protocol and obtained from the UCLA Brain Tumor Translational Resource. The GBM tissue was derived from a treatment-naïve newly diagnosed patient. Normal brain tissue was derived from the right frontal lobe of a trauma patient. Both tissues were stored at −80°C until DNA extraction. Genomic DNA was isolated from the GBM and normal human brain samples and from the U87MG human GBM cell line using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA).

Fully-methylated control
Methylation of all genomic CpG sites in vitro was achieved by treating 4 µg genomic DNA (normal brain) with 16 units of SssI enzyme (NEB, Ipswich, MA, USA) and 160 nmol S-Adenosylmethione for 6 h at 37°C twice. The product was purifi ed using the Zymo Clean and Concentrator Kit (Zymo Research Corp., Orange, CA, USA), in which the DNA was eluted using 20 µL of PCR-grade H 2 O. The typical yield after purifi cation was 2 µg. Using bisulfi te Frontiers in Neuroscience | Neurogenomics sequencing of uncloned DNA, sample CpG Island regions were verifi ed to have complete methylation of all CpG sites (data not shown).

Fully-unmethylated control
'Removal' of all methylation markings was achieved by subjecting 10 ng of genomic DNA (normal brain) to whole-genome amplifi cation with the GenomiPhi V2 Amplifi cation Kit (Amersham Biosciences, Piscataway, NJ, USA) according to manufacturer's instructions with unmodifi ed dCTP. The typical yield was 4-7 µg. Using bisulfi te sequencing of uncloned DNA, sample CpG Island regions were verifi ed to have complete absence of methylation of all CpG sites (data not shown).

AMPLICON GENERATION
Two separate experiments were performed for each sample except for the unmethylated control in which only one experiment was performed. Amplicon generation was based on the which is used as background noise in Feature Extraction software. Refer to Figure S2 in Supplementary Material for signal distribution). We then focused our analysis on fragments instead of the individual probes assuming that all probes binding to the same fragment should provide concordant information. To do so, we used the in silico simulation to group all probes binding to the same fragment together and used each group of corrected probe values to calculate the median M value for that fragment. Sensitive fragments were assigned z-scores and p-values based on their M values compared to this insensitive population, assuming a normal distribution for M values. In addition, log 2 of Cy-5 channel intensities (HpaII cut -rLog) of each sensitive fragment were also assigned p-values based on the distribution of the insensitive fragments. Overall, a fragment is called positive if it has a high M value and an adequate HpaII digest channel signal (features with signals lower than the bottom 5% of insensitive fragments were removed, p ≤ 0.95). Most positive fragments were in the 0.1-2.0 kb size range as predicted by the typical amplicon size range ( Figure S3A in Supplementary Material). This is also apparent from the decreased signal intensity with increased fragment size seen in the insensitive fragments ( Figure S3B in Supplementary Material). Various fragments were selected based on their M values and validated with bisulfi te sequencing (as described below). An empirical threshold was determined by performing a Receiver Operating Characteristics (ROC) analysis of the validated results. A BED fi le for the UCSC Genome Browser with detailed results was generated. For replicate experiments, data from M values and rLog values for both experiments were averaged after normalization. p-values were assigned after averaging the two experiments. The data has been uploaded into GEO (Submissions ID GSE19363).

METHYLATION-SPECIFIC (MSP) AND BISULFITE SEQUENCING FOR VALIDATION
To generate bisulfi te modifi ed DNA for manual validation, DNA samples were modifi ed using the EZ DNA Methylation-Gold Kit (D5006, Zymo Research Corp., Orange, CA, USA) following the manufacture protocol. This converts all unmethylated but not methylated cytosines to uracils.

Bisulfi te sequencing validation
Primers were designed with the assistance of the online tool MethPrimer (Li and Dahiya, 2002). To validate positive and negative calls, primers were designed to enable coverage of all CCGGs within the BfaI fragment. Sequencing reactions were set up using the designed primers DMH method with several modifi cations (Yan et al., 2002). Most notably, BfaI was used instead of MseI for genomic fragmentation. Briefl y, 750 ng of each genomic sample was digested with 25 units of BfaI (NEB, Ipswich, MA, USA) for 6 h at 37°C. This digested DNA was then ligated to oligonucleotide linkers consisting of annealed H-12 (5′-TAATCCCTCGGA-3′) and H-24 (5′-AGGCAACTGTGCTATCCGAGGGAT-3′) oligonucleotides. Following ligation, the genomic DNA was divided into two equal portions for separate digestion with 20 units of HpaII (methylation-sensitive isoschizomer) or 40 units of MspI (methylationinsensitive isoschizomer) (NEB). Each digest was subjected to PCR amplifi cation using the same H-12/H-24 primers with the following conditions: 72°C for 5 min, followed by 16 cycles of 97°C for 1 min, 62°C for 30 s, 72°C for 3 min, followed by 72°C for 10 min. Amplifi ed HpaII-digested DNA was labeled with Cyanine 5-dUTP (Cy-5) (Perkin-Elmer, Waltham, MA, USA), and amplifi ed MspI-digested DNA was labeled with Cyanine 3-dUTP (Cy-3) (Perkin-Elmer) using the Bioprime Array CGH Genomics Labeling Module (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. The Cy-5 and Cy-3 labeled amplicons were then co-hybridized at 65°C for 40 h to the Agilent CpG island microarrays according to the manufacturer's instructions. The hybridized slides were washed using the Agilent Oligo aCGH/ChIP-on-Chip Wash Buffer Kit.

DATA ANALYSIS
Hybridized microarrays were scanned with an Agilent DNA Microarray Scanner (UCLA Microarray Core) to create an image fi le. From this fi le, Cy-5 (HpaII digested) and Cy-3 (MspI digested) intensities for each probe were determined by the Agilent Feature Extraction 9.1 software using the CGH protocol v4.95 Feb07 (Agilent Technologies). Two channel intensity readings were collected from the Agilent Feature Extraction CGH workfl ow (Feb 2007, using grid fi le 20070820) before any normalization was done (up to step 20 in Reference Guide from Feature Extraction 9.1). The tab delimited Feature Extraction output fi le (.txt fi le) was loaded into a MySQL database. Data analysis was done in R based on the limma package (Smyth, 2005). As the main tool to analyze the microarray data, MA plots were generated based on the local background correction signals, where M is the log 2 of the Cy-5/Cy-3 ratio and A is the average log 2 of the Cy-5 and Cy-3 intensities. A Loess curve was determined from the probes corresponding to the insensitive fragments (insensitive probes) and was used to correct every data point generated from the entire array (Cleveland et al., 1992). Application of the Loess correction enabled removal of the intensity dependent error from the two channels and also normalized the readings of insensitive probes, which were expected to have 1:1 ratio (or log 2 ratio equal to 0) between the two digests ( Figure S1 in Supplementary Material). Dye swapping experiments were not performed since it has been shown that such a Loess normalization without background subtraction appears to adequately correct for dye bias particularly in probes with high differential intensities between the dyes (suggesting that methylated calls will have relatively increased reliability) (Zahurak et al., 2007). The distribution of signal strength of insensitive fragments was signifi cantly higher than the noise level (measured by technical control spots on Agilent Microarray, directly on the bisulfi te treated sample sent to the UCLA Genotyping and Sequencing Core for sequencing. Any detectable cytosine peak signifi ed a methylated cytosine in genomic DNA.

IN SILICO SIMULATION OF MSRE METHOD APPLIED TO AGILENT CPG ISLAND MICROARRAY INDICATES THAT BfaI IS SUPERIOR TO MseI FOR GENOMIC FRAGMENTATION
As diagrammed in Figure 1, we used Agilent CpG island arrays in conjunction with the MSRE method described by Adrien et al. (2006). In terms of amplicon generation, the main difference was the use of BfaI (suggested by P. Yan, personal communication) instead of MseI for initial genomic fragmentation. To compare the theoretical CpG island coverage characteristics achievable with BfaI or MseI fragmentation, we performed in silico determination of all 'evaluable' CpG islands using the following criteria: (1) the CpG island has a hybridizing genomic fragment(s) tiled by Agilent probe(s) and (2) the fragment contains one or more HpaII/MspI (C/CGG) recognition site(s). The results of the in silico simulation are shown in Table 1.
First, we determined all CpG islands (retrieved from UCSC database, hg18) that could hybridize to probes and found 25,189 of 28,226 total CpG islands were tiled by the Agilent CpG island array at 100 bp intervals. In silico BfaI digestion of the UCSC hg18 human genome resulted in generation of 49,160 fragments (comprising ∼41 million base pairs, 832.4 bp average size) hybridizing to one or more of the 'useable' Agilent probe sequences, whereas MseI digestion resulted in generation of 35,651 fragments (comprising ∼43 million base pairs, 1,200 bp average size). Probes (designated as partial binding) containing internal fragmentation (BfaI or MseI) were omitted due to potential poor or multiple binding. These fragments were then subdivided into sensitive fragments, which have one or more internal HpaII/MspI (CCGG) sites, and insensitive

MSRE ON FULLY-METHYLATED SSSI TREATED BRAIN SAMPLE CONFIRMS DISTINCT M-VALUE DISTRIBUTIONS BETWEEN PREDICTED INSENSITIVE AND SENSITIVE FRAGMENTS
As shown in Table 1, BfaI fragmentation results in generation of 4,066 fragments designated as 'insensitive' based on the lack of internal HpaII/MspI sites. To confi rm that these fragments would remain 'insensitive' in a fully methylated sample, we analyzed fullymethylated (SssI-treated) and fully-unmethylated (whole genome amplifi ed) genomic normal brain DNA using MSRE as described in Materials and Methods. For the fully-methylated sample, we observed the expected shift in M-values between the sensitive fragments compared to the insensitive fragments. For the fullyunmethylated DNA sample, sensitive and insensitive fragments showed no signifi cant difference in their M-values (Figure 2). Closer inspection of the M distribution ( Figure 2C) indicates that there is a slight skew towards M > 0 in the methylated sample versus the unmethylated sample. This indicates that a small subset of the in silico identifi ed insensitive fragments may not actually be 'insensitive' , possibly due to the presence of polymorphisms in generating HpaII sites or altering BfaI sites. Given the negligibly small subset Frontiers in Neuroscience | Neurogenomics of these discrepant fragments, we utilized this 'insensitive' group of fragments by identifying corresponding probes (∼6,000) and using their M-values as an internal control for normalization of our experimental data and quality control (described in Materials and Methods). We analyzed the positive control data to investigate the ability to detect methylated fragments based on their size and their number of internal HpaII sites (Figure 3). This result confi rmed our proposed criteria for fragment size limitation between 0.1-2.0 kb (Figure 3A), with the optimal number of internal HpaII sites between 1-10 ( Figure 3B). We suggest that every fragment result can be annotated with the fragment length and number of internal CCGG to guide the interpretation.

MSRE ANALYSIS OF NORMAL BRAIN, U87MG, AND GBM PATIENT SAMPLES CAN IDENTIFY DIFFERENTIALLY METHYLATED FRAGMENTS
To apply MSRE to actual human genomic DNA samples, we performed two independent MSRE experiments on each of three samples -genomic DNA from normal brain (right frontal), the U87MG cell line and a GBM patient tissue -and averaged the results (M and A values). A list of differentially methylated fragments possibly relevant to brain cancer was selected for bisulfi te sequencing ( Table 2).
p-values for each fragment were assigned based on the M-value relative to the distribution of 'insensitive' fragment M-values as described in Materials and Methods. We initially chose fragments most likely to be differentially methylated in the three samples by applying arbitrary thresholds of p ≤ 0.005 for methylated and p ≥ 0.05 for unmethylated (p values were assigned using the insensitive population as described in Materials and Methods). Depending on whether we predicted the fragment to be methylated or unmethylated in normal, U87MG and GBM genomic DNA samples, fragments were grouped into MMM (i.e. methylated in normal, methylated in U87MG and methylated in GBM), UMM, UUM, MUU, UMU, and UUU categories. For each selected fragment, bisulfi te sequencing primers were designed, and uncloned DNA samples were subjected to bisulfi te sequencing as described in Materials and Methods. The DNA bisulfi te sequencing results were tabulated in Table 2. All CCGGs within a fragment were sequenced with a few exceptions (labeled X). For confi rmation of fragment As a measure of the agreement between the replicate experiments, from the six total experiments on the three samples, we found that 7 of 72 validations has discordant methylation calls between the technical replicates using the threshold of p ≤ 0.005 (Table S1 in Supplementary Material). In addition, unsupervised clustering methylation, we required methylation to be found at all fragment CCGG sites; in contrast, for confi rmation of lack of fragment methylation, we required that at least one CCGG was unmethylated even if other sites were methylated or unable to be sequenced. Out of the targeted 72 validations, we were able to generate 65 such confi rmations. Based on the initial arbitrary p-value thresholds, only 4 of Frontiers in Neuroscience | Neurogenomics using dChip (Li and Wong, 2001) based on z-scores of log 2 ratio of all fragments monitored by MSRE showed similarity of technical replicates ( Figure S5 in Supplementary Material).

DETERMINATION OF p-VALUE THRESHOLD BY ROC ANALYSIS
To determine the correct threshold for methylation discrimination, we used the validation results to perform a receiver operating characteristic (ROC) analysis. However, due to the selection criteria applied during the fi rst round of validation, we lacked data points corresponding to 0.005 < p < 0.05. Therefore, we selected another 15 fragments with p-values encompassing this region and performed uncloned bisulfi te sequencing on each sample. Results from these 15 fragments provided 45 more points for our analysis (bottom portion of  (Figure 4). Inspection of these graphs suggests that the optimal threshold lies between 0.001 ≤ p ≤ 0.005. At p ≤ 0.001, NPV = PPV = 87%, and at p ≤ 0.005 (Figure 4A), the FPR plateaus ( Figure 4B). All validation results were indicated in the MA plots ( Figure 5).
Using the threshold of p ≤ 0.001, we found that 5,521 fragments were methylated in the normal brain sample, representing 4,256 CpG islands, whereas we found 7,047 and 6,782 methylated fragments (5,174 and 5,045 CpG islands) for the U87MG and GBM patient DNA, respectively (Table 3 and Figure 6A). At this threshold, 3,125 fragments covering 2,417 CpG islands were commonly methylated in all three samples. To compare differentially methylated CpG islands between samples, we used a more stringent positive threshold (p ≤ 0.001, PPV = 93%) and negative threshold (p ≥ 0.01, NPV = 98%) to lower false positive/negative rate. With this comparison, we were able identify 70, 480, and 145 uniquely methylated CpG islands in normal, U87MG, and GBM respectively ( Figure 6B).

MRSE CAN DETECT MGMT PROMOTER METHYLATION
To examine whether MGMT promoter methylation could be detected by our approach, we compared the MSRE profi ling results with methylation-specifi c PCR (MSP) on MGMT (Hegi et al., 2004) and bisulfi te sequencing determination of the three samples. The MGMT CpG Island is covered by two BfaI fragments ( Figure 7A). One fragment (Fragment 1) is 502 bp and contains 12 CCGG sites. The other fragment (Fragment 2), coinciding with the MSP primers, is 265 bp in size and contains 3 CCGG sites. Using a threshold of p ≤ 0.001, the 502-bp fragment was not found to be methylated in any of the samples by MSRE whereas methylation of the 265-bp fragment (Fragment 2) was detected for the GBM sample, but not for U87MG or normal brain ( Figure 5). However, by MSP, both the U87MG and GBM sample but not normal brain are methylated ( Figure 7B). When this fragment was subjected to bisulfi te sequencing, we found that the GBM fragment was methylated at each of the three CCGG sites (Figure 7C). In contrast, the U87MG fragment was unmethylated at one of the three CCGG sites within the fragment, whereas the normal brain sample was unmethylated at two of the three CCGG sites.

DISCUSSION
In this study, we have described a genome-wide methylation profi ling approach utilizing MSRE combined with Agilent 244K CpG island microarrays that has been optimized via in silico restriction enzyme analysis of the human genome sequence. In silico comparison between MseI and BfaI fragmentation demonstrated that BfaI provides better coverage and fragment characteristics for MSRE when reasonably amplifi ed fragments sizes (<2.0 kb) are considered. To explore the feasibility of the method, we applied MSRE to genomic DNA from normal brain tissue, U87MG GBM cell line, and a GBM patient tissue sample. Uncloned bisulfi te sequencing was used to cross validate ∼120 fragments to determine the sensitivity and specifi city of the calls from our analysis. The validation results also provided the basis to estimate a threshold separating methylated and unmethylated fragments.
The MSRE approach is derived from DMH, one of the most commonly used techniques for large-scale methylation profi ling of cancer genomes (Huang et al., 1999;Yan et al., 2009). The main advantage MSRE has over DMH is that methylation determination relative to an independent reference sample is www.frontiersin.org not necessary since parallel HpaII and MspI digestions of the same sample are compared simultaneously (Adrien et al., 2006;Hatada et al., 2006). However, since such methylation sensitive and insensitive isoschizomers are essentially limited to this single pair, a possible advantage of DMH compared to MSRE is the potential to use an enzyme cocktail selected from the entire set of methylation sensitive restriction enzymes. However, in terms of coverage, our in silico simulation indicates that solely monitoring CCGG restriction sites enables coverage of nearly all CpG islands covered with only a small number of predicted insensitive fragments (8.3%). Furthermore, increasing the number of monitored restriction sites per fragment by using additional methylation sensitive restriction enzymes appears to be disadvantageous based on recognition that the major limitation of both DMH and MSRE is the necessity for every CCGG restriction site within a fragment to be methylated in tandem in order to detect methylation. Another important benefi t of the in silico analysis was identifi cation of MSRE insensitive fragments based on the lack of the internal CCGG sites. This small group of fragments can be easily tracked in every assay, and provided an internal control group that was used for normalization of our two-color data in order to determine the M-value threshold indicative of fragment methylation. In addition, this insensitive population provides the signal intensity range of binding fragments. Utilizing such information can reduce dye bias and batch effects between microarray experiments. From our bisulfi te sequencing validation of individual CpGs, we most commonly observed that fragments contained either complete methylation or absence of methylation at all CpGs (including those outside of CCGG sites). This observation provided additional support for the utility of restriction based approaches to detect 'native' genomic methylation. However, in a minority of instances, we observed patchy methylation (not full tandem methylation) of the CCGG sites within the fragment such that a predominantly methylated fragment would remain undetected by MSRE. One strategy to lessen the impact of this limitation is to use a restriction enzyme for fragmentation that generates more sensitive fragments with lower numbers of internal HpaII sites. We performed in silico simulations with various enzymes to determine numbers of sensitive fragments containing one to three HpaII sites. We found that AluI (34,474) and HpyCH4IV (29,320) could generate a signifi cantly increased number of sensitive fragments with low numbers of HpaII sites (1-3) compared to BfaI (15,868)     (1) decreased probe redundancy and (2) more 'partial binding probes'. Both of these problems can be addressed by custom array design tailored to the particular fragmentation scheme. Ultimately, this issue of patchy methylation demonstrates where 'massively parallel next-generation bisulfi te sequencing' will be advantageous in assessing not only CGs outside of a CCGG context, but also each CG independently of others within the same fragment. One issue that poses a limitation for methylation profi ling techniques such as MSRE is the inability to determine the relative abundance of methylation at a particular loci especially in the context of tissues containing a mixture of cell types. Our MSRE analysis enables only the categorical detection of methylation as being either present or absent. In theory, if individual fragments are not subject to any amplifi cation bias, then the HpaII intensity or the magnitude of the ratio may be useful to determine relative abundance of methylation. In future studies, quantitative bisulfi te validation techniques can be used to determine whether MSRE data can be used to give information on abundance of methylation. At this point, the major recommendation is to screen tissue for example to make sure the sample is predominantly tumor versus normal prior to DNA isolation minimize the mixture of cell types. A strategy of cell selection using laser capture dissection is limited by the DNA requirements of the assay. In light of the new TCGA methylation data set using Illumina Infi nium HumanMethylation27 (Illumina, San Diego, CA, USA) for GBM, we attempted to compare our MSRE result of single GBM sample against TCGA methylation results of GBM samples (118 samples). We mapped all 27,578 loci monitored by HumanMethylation27 to our MSRE fragmentation scheme and found 14,402 loci overlapping with 11,337 'evaluable' BfaI fragments ( Figure S6 in Supplementary Material). To determine overlapping methylated GBM loci/fragments in both platforms, we fi ltered the TCGA data for loci methylated in more than 50% of 118 samples (using beta-value ≥ 0.6 for methylated call). Of the 4,854 methylated loci satisfying these criteria, 718 loci are located within our BfaI fragments and 175 loci are located in methylated fragments in our GBM sample (p ≤ 0.001). This indicates that MSRE and HumanMethylation27 can detect methylation in not completely overlapping set of CpG islands.
In summary, this method provides a cost-effective tool for broad genome-wide CpG island methylation analysis of cancer genomes such as GBM that utilizes as little as 1 µg of sample DNA. In contrast to approaches using sonication for genomic fragmentation or MeDIP/anti-MeC antibodies for methylation discrimination, we could exploit the assay's use of restriction enzymes for both fragmentation and methylation determination to perform in silico simulations in the context of the Agilent CpG island probe design. In this way, we were able to predict overall coverage, identify an internal 'control' population of BfaI fragments not containing internal HpaII/MspI sites, understand fragment features infl uencing relative reliability of methylation determination such as fragment size, number of internal HpaII/MspI sites or probe reliability, and provide a rationale for selection of fragmentation enzymes and array design that enable increased 'effective' coverage. These data provide critical guidance to individuals seeking to determine the methylation status of individual CpG islands using restriction enzyme-based approaches. The methods can be adapted for other restriction enzyme approaches and can be applied to other array designs to indicate benefi cial compositional design alterations. For example, the importance of gene regulation via methylation of 'CpG island shores' has recently been identifi ed (Irizarry et al., 2009). MSRE coverage can be expanded to include these areas by altering the probe design and fragmentation enzyme. Half-fi lled circles indicated sites where C peak (methylated) and T peak (unmethylated) from bisulfi te sequencing were detected evenly.