Genome-Wide Analysis of DNA Methylation, Copy Number Variation, and Gene Expression in Monozygotic Twins Discordant for Primary Biliary Cirrhosis

Primary biliary cirrhosis (PBC) is an uncommon autoimmune disease with a homogeneous clinical phenotype that reflects incomplete disease concordance in monozygotic (MZ) twins. We have taken advantage of a unique collection consisting of genomic DNA and mRNA from peripheral blood cells of female MZ twins (n = 3 sets) and sisters of similar age (n = 8 pairs) discordant for disease. We performed a genome-wide study to investigate differences in (i) DNA methylation (using a custom tiled four-plex array containing tiled 50-mers 19,084 randomly chosen methylation sites), (ii) copy number variation (CNV) (with a chip including markers derived from the 1000 Genomes Project, all three HapMap phases, and recently published studies), and/or (iii) gene expression (by whole-genome expression arrays). Based on the results obtained from these three approaches we utilized quantitative PCR to compare the expression of candidate genes. Importantly, our data support consistent differences in discordant twins and siblings for the (i) methylation profiles of 60 gene regions, (ii) CNV of 10 genes, and (iii) the expression of 2 interferon-dependent genes. Quantitative PCR analysis showed that 17 of these genes are differentially expressed in discordant sibling pairs. In conclusion, we report that MZ twins and sisters discordant for PBC manifest particular epigenetic differences and highlight the value of the epigenetic study of twins.


INTRODUCTION
Primary biliary cirrhosis (PBC) is a female-predominant autoimmune liver disease affecting the small interlobular bile ducts, ultimately leading to periportal fibrosis and cirrhosis (1). Similar to most autoimmune diseases, PBC onset results from the interplay of genomic predisposition and environmental factors (2)(3)(4)(5) with a possible role for sex factors (6). Recent genome-wide association studies (GWAS) have reported consistent associations with polymorphisms of genes such as IL12RA and HLA class II in subgroups of patients with PBC (7-13) and a pathway analysis was recently performed (13). PBC concordance rates in dizygotic (DZ) and monozygotic (MZ) twins are significantly different being 0 and 63%, respectively, thus supporting the role of both genetic and environmental factors (14) with the latter supported also by epidemiology (15,16).
Promoter methylation influences gene expression (GEX) and our group recently reported differences in the DNA methylation and expression of two X-linked genes (PIN4 and CLIC2) in MZ twins discordant for PBC (17). On the other hand, copy number variations (CNV) are the result of duplications and other rearrangements (18) occur de novo at much higher rates than single nucleotide variants, and may regulate GEX (19). While www.frontiersin.org sharing their genomic sequence, MZ twins may develop different phenotypes over the years because of increasing differences in DNA methylation (20) and CNV (21,22).
We have taken advantage of a unique DNA collection of identical and non-identical twins with PBC and performed a genomewide investigation to determine differences in DNA methylation, CNV, and GEX. Our data identify 17 candidate genes that are significantly under-or up-regulated in affected individuals and we suggest that these might constitute new candidates as disease markers of genetic determinants. The value of this approach is highlighted and suggests the need for the study of a large number of patients and cell subpopulations (23) to support this thesis.

SUBJECTS
Blood samples from three MZ twins pairs discordant for PBC whose zygosity had been determined using microsatellite analysis (Ballestar) and eight sister pairs of similar age (within 5 years) discordant for PBC studied ( Table 1). Serum antimitochondrial antibodies (AMA) were positive at indirect immunofluorescence in all patients with PBC and none of the healthy twins and sisters. In these subjects, PBC was excluded when serum AMA was negative and serum alkaline phosphatase was within normal range on two different occasions. Genomic DNA was isolated from peripheral blood mononuclear cells (PBMCs) using the QIAamp Blood Midi Kit (Qiagen, Valencia, CA, USA) and stored at −20°C until used. Additional blood samples were obtained using Tempus™ Blood RNA Tubes (Applied Biosystems, Foster City, CA, USA) that were stored at −20°C until mRNA was extracted using the RNeasy Mini Kit (Qiagen, Valencia, CA, USA) and then stored at −80°C. This study was performed in compliance with the ethical standards of medicine and, following approval by the local IRB, informed consents were obtained from all patients and controls in accordance with the Declaration of Helsinki.

METHYLATED DNA IMMUNOPRECIPITATION AND METHYLATION ANALYSIS
DNA samples of three MZ twin sets (#1/2, 9/52, and 24/57; see Table 1) were sonicated and then immunoprecipitated with a monoclonal antibody that specifically recognizes 5-methylcytidine (Roche NimbleGen, Madison, WI, USA). DNA fragments were converted into PCR-amplifiable OmniPlex™ Library molecules flanked by universal primer sites and the library amplified by PCR using universal primers and a limited number of cycles. Immunoprecipitated and reference DNA were tagged, respectively, with cyanine-5 (Cy5) and cyanine-3 (Cy3)-labeled random 9-mers and then hybridized by the NimbleGen Array Hybridization Kit (Roche NimbleGen, Madison, WI, USA).
A four-plex array was custom-designed to include 998 X chromosome and 18,086 randomly selected autosomal chromosome promoter sites (Roche NimbleGen, Madison, WI, USA) and samples analyzed following the manufacturers protocols. First, Nimblescan software (Roche NimbleGen, Madison, WI, USA) was utilized for DNA methylation data analysis using a threshold p-value of 0.05 equivalent to 1.31 based on the Gaussian distribution of data. Second, exclusive elements corresponding to specific microarray probes were identified in affected and healthy subjects and peaks found only in either group were selected for further analysis. Third, elements of interest were inserted into the UCSC Genome Browser (GRCh36/hg19) to identify corresponding genes.

COPY NUMBER VARIATION ANALYSIS
Copy number variation analysis was performed on genomic DNA from one MZ twin set (#1/2; see Table 1) using the Infinium R HD Assay Super platform (Illumina, San Diego, CA, USA): in particular, we utilized the HumanOmni1-Quad BeadChip that includes markers derived from the 1000 Genomes Project, all three HapMap phases, and recently published studies (7,9,24,25) as well as adequate tools for quality control, CNV calling, and validation. The protocol included the initial DNA preamplification, fragmentation, and precipitation. Data obtained from four-plex chips were analyzed using iScan and Illumina BeadArray system (Illumina, San Diego, CA, USA) followed by the GenomeStudio software (Illumina, San Diego, CA, USA). The position of each probe and the number of copies for each probe were determined using the PennCNV platform based on a hidden Markov model algorithm (26). The UCSC Genome Browser was then used to determine the genes involved and the number of CNV.

MICROARRAY GENE EXPRESSION ANALYSIS
We utilized RNA samples from eight pairs of sisters of similar age ( Table 1) discordant for PBC. In the first part, we performed a whole-genome microarray comparison of transcripts to detect consistently up-or down-regulated genes in affected subjects. We obtained biotin-labeled cRNA using the Illumina R TotalPrep RNA Amplification Kit (Illumina, San Diego, CA, USA) and used the whole-genome Gene Expression Direct Hybridization Assay (Illumina, San Diego, CA, USA) including 24,500 transcripts. Microarrays were scanned using the BeadArray Reader (Illumina Inc., San Diego, CA, USA) and data were processed using Bead-Studio software (Illumina Inc.). Expression data were quantified using a cut-off for significant gene differences of p < 0.05 with a twofold difference in expression as described elsewhere (27).

RT-PCR EXPRESSION ANALYSIS
Real-time PCR was utilized to analyze samples prepared from 1 µg total RNA according to high-capacity cDNA reverse transcription kit (Applied Biosystems, Foster City, CA, USA) in seven pairs of sisters of similar age (#15/16, 5/14, 6/11, 7/12, 8/13, 26/27, 33/34; see Table 1). Micro-fluidic real-time quantitative PCR cards were customized to include single-plex assays for all candidate genes obtained with DNA methylation, CNV, and GEX analyses. Genes reported by GWAS studies were also included among the candidates (7,9,24,25). All samples were analyzed in duplicate, and included 94 candidate genes and the 18S and β-actin housekeeping genes. Analyses were performed using the ABI Prism 7900HT Sequence Detection System (SDS 2.2.2 software, Applied Biosystems, Foster City, CA, USA). PCR cycle conditions included 50°C for 2 min, 94.5°C for 10 min, and 40 cycles of 97°C for 30 s followed by 59.7°C for 1 min. The preliminary study of all 10 samples defined the maximum allowable cycle threshold (CT) that was set at 38 while outliers exceeding this threshold were excluded from the statistical analysis and no adjustment of p-value was performed. Internal controls for calculating expression levels of candidate genes were 18S and ACTB (beta-actin). The analysis has been performed with Data Assist version 3 statistical software (Applied Biosystems). The software exports data from real-time PCR and performs relative quantification analysis. The data assist analysis contains: C t data, sample design, assay design, average of C t values of replicates, ∆C t , normalized versus endogenous controls C t values ± SD and fold change (RQ) files, which displays RQ min and RQ max for each sample. p-Value is calculated from ∆C t files.
A heat map is used to visualize the data and illustrates, for all case/control sibling pairs, GEX in red/green color based on ∆C t values using Pearson's correlation. The neutral/middle expression was set as the median of all the ∆C t values from all samples, the red indicated an increase with a ∆C t value below the middle level and the green indicated a decrease with ∆C t above the middle level .

PATHWAY ANALYSIS
Gene networks were generated through the use of Ingenuity Pathways Analysis software 8.0. Edition (Ingenuity Systems, http: //www.ingenuity.com). Each gene identified was mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base and overlaid onto a global molecular network. The SDS 2.2.2 software (Applied Biosystems, Foster City, CA, USA) was used to determine changes in expression of a target in an experimental sample relative to the same target in a reference sample with the Student's t -test and p-value <0.05 were considered statistically significant. We utilized Data Assist Software version 3 statistical software (Applied Biosystems) and Stata 8.0 for MacIntosh (Stata Corp, College Station, TX, USA) for statistical analyses.

DNA METHYLATION
DNA methylation comparison showed 60 differentially methylated regions (DMR) in affected compared to the non-affected twin (p < 0.05 for each of the three discordant twin pairs). These DMR corresponded to 51 genes on the X chromosome and 9 genes on autosomal chromosomes, listed in Table 2. For each DMR, the PBC proband was hypermethylated compared with the non-affected twin.

COPY NUMBER VARIATIONS
Ten CNV were discordant between affected and the non-affected twin in one twin set. The healthy twin had four CNVs that were missing in the affected twin and six CNVs were present only in the affected twin. The CNVs were found in the following genes: RYBP ring 1, YY1 binding protein, HERV-V2 envelope glycoprotein ENVV2, POTEK P ankirin domain family member K pseudogene, THSD7A thrombospondin type 1 domain containing 7A = KIAA0960, GOLGA8A golgin A8 family member A, BPTF bromodomain PHD finger transcription factor, and C17orf58 open reading frame. Two additional CNV did not correspond to known genes.

MICROARRAY GENE EXPRESSION
Gene expression analysis using the genome-wide microarray showed two genes significantly down-regulated in PBC compared to the healthy sister in each of the eight discordant sister pairs. These genes were IFIT1 (interferon-induced protein with tetratricopeptide repeats; chromosome 10q23.31) and IFI44L (interferon-induced protein 44-like; chromosome 1p31.1) and both are interferon-induced (28).

RT-PCR ANALYSIS
To provide additional support for our initial findings, we used RT-PCR to evaluate expression of each of the candidates that emerged from the DNA methylation (60), CNV (10), and expression studies (2), as well as previously reported GWAS in seven pairs of discordant sisters of similar age ( Table 1) (7-9, 12, 13, 24, 25). Our data assist analysis contained: C t data, sample design, assay design, average of C t values of replicates, ∆C t , normalized versus endogenous controls C t values ± SD and fold change (RQ) files, which displays RQ min and RQ max for each sample. p-Value was calculated from ∆C t files. Data assist v3.0 software was used with results exported from real-time PCR and for relative quantification analysis. Graphic result in heat map visualized analyzed data (Figure 1). Heat map showed, for all case/control sibling pairs, genes expression in red/green color based on ∆C t values using Pearson's correlation. The neutral/middle expression was set as the median of all the ∆C t values from all samples, the red indicated an increase with a ∆C t value below the middle level and the green indicated a decrease with ∆C t value above the middle level. The heat map from all samples is represented in Figure 1. Among the entire set of candidate genes, we found five genes that were underexpressed in at least three of seven sibling pairs with FC < 0.5 (CXCR5, HLA-B, IFI44L, IFIT1, SMARCA1) and one overexpressed gene in at least three of seven pairs with an FC > 2 (IL6 ). Additional 11 genes showed a widely variable expression profile in each sibling pair (CD80, FAM104B, HLA-DQB1, HLA-DRB1, HLA-G, MTCP1, NHS, PIN4, PRPF38A, THSD7A, and TNFAIP2) (Table 3; Figure 2).

PATHWAY ANALYSIS
Pathway analyses were performed using the 17 resulting genes from our study and demonstrated that the most representative functions www.frontiersin.org

DISCUSSION
Primary biliary cirrhosis is considered a prototypic autoimmune disease because of the clinical homogeneity between patients and the relative consistency in natural history and pathology. Although relatively uncommon, several independent GWAS (7-13) have identified associations with transcription factors that further suggest a potential role for epigenetic shifts and thus our approach using this unique collection of DNA is a particularly important resource. We are aware of the numerous limitations of our study and that the observed changes in GEX may be stochastic rather than secondary to disease progression or involved in pathways involved in PBC pathogenesis, as suggested for other autoimmune diseases (29)(30)(31)(32). The latter includes the possibility of portal hypertension and resulting leukopenia. We identified 60 DMR and 10 CNV between discordant MZ twins with 14 (20%) also differently expressed between PBC cases and control sisters, thus being stronger candidates as PBC biomarkers or determinants. One of the strengths of our study is the confirmation of identified genes by quantitative PCR and that this approach was extended also to genes identified in recent GWAS allowing identification of six genes differently expressed in PBC mononuclear cells. First, these genes support a down-modulation of Th2-cytokines such as IFIT1, an interferon type I signature represented by IFI44L, in favor of a fibrogenic phenotype as represented by the IL6 up-regulation (33). Regarding this last observation, we note the apparent discrepancy between DNA methylation and GEX of IL6 but we recognize that methylation does not fully correlate with GEX, and the difference could be explained by different mechanisms such as allele-specific methylation (34,35) (Table 4). Second, a single DMR-associated gene, i.e., hypermethylated SMARCA1, manifested a reduced GEX confirmed in our RT-PCR study of sibling pairs. SMARCA1 is a transcription www.frontiersin.org regulator that modulates the chromatin structure and is involved in apoptosis, DNA damage, and differentiation. Moreover, the gene encodes for a member of the SWI/SNF family of proteins, which FIGURE 1 | Heat map showed, for all case/control sibling pairs, genes expression in red/green color based on ∆C t values using Pearson's correlation. The red indicated an increased expression with a ∆C t value below the middle level and the green indicated a decreased expression with ∆C t value above the middle level.
are master regulators of GEX, regulating expression among others FOS, CSF-1, CRYAB, MIM-1, p21 (also known as CDKN1A), HSP70, VIM, and CCNA2; SWI/SNF has also been reported to modulate alternative splicing (36). Third, 5/7 sibling pairs had consistent dysregulation of CXCR5 being down-regulated in PBC lymphocytes, which may reflect a compartmentalization of CXCR5+ cells within the liver or may reflect the chronic activation of B cells, as reported in rheumatoid arthritis (37). In fact, the chemokine receptor CXCR5 is expressed by B and T cells and controls their migration within lymph nodes while its ligand BCA-1/CXCL13 is present in lymph nodes and spleen and also in the liver. A downregulation of CXRC5 is correlated with an increased production of IL-2, which may cause the production of immunoglobulins by B cells; IL-2 is normally produced by T cells during an immune response. IL-2 is also necessary during T cell differentiation in Treg, which are involved in self antigens recognition, which could result   in autoimmunity (38). Of note, following B cell activation and differentiation into plasma cells and memory cells, CXCR5 becomes down-regulated while the same effect is induced in vitro following anti-CD40 stimulation (39) and CD40L methylation appears to be altered in PBC (40). Fourth, HLA-B is also down-regulated in PBC, similar to several types of cancer (41)(42)(43). The majority of the identified genes map on the X chromosome, in agreement with the female predominance of the disease, and are involved in many cellular pathways. Our group in a previous work assessed the expression of 125 genes with variable X inactivation status and found that two genes (CLIC2 and PIN4) were consistently down-regulated in PBC affected twin of discordant pairs (17). Three genes are differentially methylated in lymphocytes of patients with PBC and systemic sclerosis (32) and may thus be representative of general autoimmunity or fibrosis development; these genes include MTM1 hypermethylated in PBC and in systemic sclerosis while SSR4 and IGH3G are hypomethylated in both diseases. Of note, a recent study reported the up-regulation of the X-linked costimulatory molecule CD40L (40) but our data failed to confirm such hypomethylation in our cohort. The CNV differences observed in our MZ twin set warrant some further observations as the de novo post-twinning CNV frequency was estimated to be as high as 5% on a per-individual basis or 10% per twinning event (21). While the impact of CNV on GEX can vary (44), it would be of great interest to obtain parental information to determine the origin and timing of CNV in the offspring. On the other hand, there are several limitations to our data. PBC is relatively uncommon and our DNA collection reflects a several-year worldwide search; it is nonetheless a limited dataset. In addition, there is only limited information available using PBMC. PBC is an organ-specific disease affecting small intrahepatic bile ducts and thus studies of the portal infiltrating lymphocytes will provide a more valuable resource as would a detailed and well-defined lymphoid cell populations. These comments notwithstanding, the data obtained are intriguing and consistent with our thesis that one explanation for discordant MZ twins is DNA changes on the critical genomic element involved in disease susceptibility and these observations should be recapitulated also in unrelated pairs of patients and controls. With the increased interest in the balance between genetic susceptibility, it becomes critical for research groups to combine resources and improve access to clinical material and data that permits more extensive studies and the potential for more powerful statistical analysis and interpretation.