Validation of Genome-Wide Intervertebral Disk Calcification Associations in Dachshund and Further Investigation of the Chromosome 12 Susceptibility Locus

Herniation of the intervertebral disk is a common cause of neurological dysfunction in the dog, particularly in the Dachshund. Using the Illumina CanineHD BeadChip, we have previously identified a major locus on canine chromosome 12 nucleotide positions 36,750,205–38,524,449 that strongly associates with intervertebral disk calcification in Danish wire-haired Dachshunds. In this study, targeted resequencing identified two synonymous variants in MB21D1 and one in the 5′-untranslated region of KCNQ5 that associates with intervertebral disk calcification in an independent sample of wire-haired Dachshunds. Haploview identified seven linkage disequilibrium blocks across the disease-associated region. The effect of haplotype windows on disk calcification shows that all haplotype windows are significantly associated with disk calcification. However, our predictions imply that the causal variant(s) are most likely to be found between nucleotide 36,750,205–37,494,845 as this region explains the highest proportion of variance in the dataset. Finally, we develop a risk prediction model for wire-haired Dachshunds. We validated the association of the chromosome 12 locus with disk calcification in an independent sample of wire-haired Dachshunds and identify potential risk variants. Additionally, we estimated haplotype effects and set up a model for prediction of disk calcifications in wire-haired Dachshunds based on genotype data. This genetic prediction model may prove useful in selection of breeding animals in future breeding programs.


INTRODUCTION
In the dog, herniation of the intervertebral disk is a common cause of neurological dysfunction. Especially the Dachshund is predisposed with a relative risk 10-12 times higher than all other breeds (Priester, 1976;Goggin et al., 2000) and an estimated lifetime occurrence of 19% (Ball et al., 1982). The intervertebral disks lie between adjacent vertebrae in the vertebral column forming cartilaginous joints that allow slight movements between vertebrae. The disks are complex structures consisting of a gelatinous core called the nucleus pulposus, an outer fibrous ring called the annulus fibrosus, and the cartilaginous endplates representing the cranial and caudal boundaries of the intervertebral disk. In the Dachshund and other hypochondroplastic breeds the predisposition to intervertebral disk herniation is the result of an early degenerative process, which can result in disk calcification (Hansen, 1952). The degeneration is preceded by early chondroid Abbreviations: BLUP, best linear unbiased prediction; CFA, canine chromosome; DDC, Danish Dachshund Club; GLM, generalized linear model; GWAS, genomewide association studies; LD, linkage disequilibrium; ncRNAs, non-coding RNAs; qPCR, quantitative fluorescence PCR; RMD, residual mean deviance; R 2 , coefficient of determination; UTRs, untranslated regions. metaplasia emerging from the perinuclear zone and affecting the majority of the nucleus pulposus and perinuclear annulus fibrosus with profound matrix changes occurring within the first year of life (Hansen, 1952;Ghosh et al., 1976). Dogs with several disk calcifications are at particular high risk of herniation, while herniation rarely occurs in dogs without disk calcifications (Stigen, 1996;Lappalainen et al., 2001). A radiographic evaluation of the number of calcified disks at 2 years of age is a good indicator for the severity of the degeneration and associates strongly with the occurrence of clinical disk herniation at a later age (Jensen et al., 2008). The severity of disk degeneration among breeds describes a continuous spectrum suggesting a multifactorial etiology involving the cumulative effects of several genes and environmental factors (Ball et al., 1982). Severe disk degeneration with calcification has previously been shown highly heritable in Dachshund with heritability estimates of 0.47-0.87 (Jensen and Christensen, 2000). To decrease the occurrence of clinical disk herniation in the Danish Dachshund population the Danish Dachshund Club (DDC) has established breeding guidelines. Based on radiographic examinations at 24-42 months of age the number of calcified disks is determined and since 2008, DDC has recommended excluding dogs with ≥5 calcified disks from breeding. Since 2009, screening of breeding dogs www.frontiersin.org has been mandatory and breeding values of disk calcification have been estimated, using a BLUP (Best Linear Unbiased Prediction) Animal model.
Within the past few years genome-wide association studies (GWAS) have identified numerous promising signals of association between genetic variants and human traits. The use of high density SNP arrays have also shown strength in disease mapping in dogs and has opened doors toward a greater understanding of the genetic architecture of several complex diseases (Wood et al., 2009;Wilbe et al., 2010;Madsen et al., 2011). The genetic homogeneity existing within dog breeds and the spontaneous occurrence of specific diseases in different breeds indicate a breed specific accumulation of disease causing genetic factors. This provides the dog with some advantages in studying genetic diseases as fewer markers and individuals are needed when compared with human studies (Sutter et al., 2004;Lindblad-Toh et al., 2005). The association signals identified through GWAS most likely represents only markers of putative risk and not the causal variant itself. Therefore, to generate hypothesis about mechanisms underlying a specific phenotype it is important to identify the causal variants themselves. This is often a difficult task and requires extensive efforts. The dog provides an excellent model to study complex diseases through the use of GWAS due to the extensive LD and long haplotype blocks characteristic of single dog breeds. However, because of long ranging LD in the dog genome, disease-associated haplotype blocks are often large, hampering the identification of the causal variant. Consequently, while the high extent of LD existing in the dog population is an advantage in the initial GWAS it may complicate the subsequent identification of the causative variant(s) (Sutter et al., 2004).
To investigate the underlying genetic mechanisms behind disk calcification, blood samples from Danish Dachshunds were collected through collaboration with the DDC. Previously, based on a GWAS in 33 cases and 28 controls using the Illumina CanineHD BeadChip, we identified a major locus associating with intervertebral disk calcification in wire-haired Dachshunds on a genomewide level on canine chromosome (CFA) 12 nucleotide positions 36,750,524,449. We discovered 36 markers within the genomic region with p-values between 0.00001 and 0.026 after correcting raw p-values for multiple testing by permutation. This provided clear evidence of the region harboring genetic components affecting the development of disk calcification and thus the risk of disk herniation in wire-haired Dachshunds (Mogensen et al., 2011). The associated locus however requires additional exploration to refine the location of the causal variant(s).
This study was performed within the LUPA project (LUPA) 1 to validate the original GWAS finding and characterize the CFA12: 36,750,524,449 susceptibility locus. Targeted resequencing was performed to identify potential functional SNPs that could explain the association signal and the local LD pattern across the disease-associated region was defined. Furthermore, haplotype window effects on disk calcification were estimated, to pinpoint a sub region more likely to harbor the causal variant(s). 1 http://www.eurolupa.org/

RESULTS
The disease-associated region contains a total of seven annotated protein coding genes in Ensembl (version 66.2); RIMS1, KCNQ5, DPPA5, C6orf221, OOEP_CANFA, DDX43, and MB21D1. Furthermore, the region harbors a number of non-coding RNAs (ncRNAs): cfa-mir-30c-2, cfa-mir-30a as well as three novel ncR-NAs. As none of these genes or ncRNAs have previously been known to influence disk calcification resequencing was used to generate a list of potential mutations that could explain the association signal. Using the NimbleGen Sequence Capture technology and the Illumina platform we enriched and sequenced the target region in one affected and one unaffected dog of wire-hair. A summary of the statistics describing the resequencing data is given in Table 1. Enrichment of the selected genomic region resulted in 631 and 356 fold enrichment for the affected and unaffected sample, respectively, compared to the non-enriched library. A high coverage was achieved for both samples with >96% of the target region being covered by at least one read and >70% of the reads mapping uniquely to the target region.
Using the MAQ software (Li et al., 2008) to infer variants from the alignment, we identified 4119 SNPs and 377 indels in the affected dog and 2956 SNPs and 250 indels in the unaffected dog compared to the reference sequence (CanFam2.0) for the domestic dog (Canis familiaris; female boxer). The case was homozygous for three SNPs in protein coding regions or untranslated regions (UTRs): two synonymous SNPs in MB21D1 and one SNP in the 5 -UTR of KCNQ5, see Table 2. These three variants where selected for genotyping in 56 unaffected and 28 affected wire-haired dogs of standard size. All three variants were found to associate with disk calcification with the SNP in the 5 UTR of KCNQ5 showing the strongest association with a p-value of 1.4 × 10 −7 , see Table 3. A list of the predicted functional effect on disk calcification for SNPs identified during resequencing can be found in Table A1 in Appendix. By genotyping the three SNPs in a sample of long-and smooth-haired Dachshund, we found no association to disk calcification, data not shown. Instead dogs of these two hair-varieties seem to be fixed for the genotype of affected wire-haired dogs.
The ∼1.8-Mb genomic region on CFA12 associating with disk calcification in Danish wire-haired Dachshund (Mogensen et al., Frontiers in Genetics | Genetic Architecture 2011) encompass seven LD blocks identified using the four gamete rule (Wang et al., 2002) in Haploview, see Figure 1. The LD blocks range from 20 to 487 kb in size and all blocks include one or more markers significantly associating with disk calcification on a genome-wide level. The marker with the lowest p-value corrected for multiple testing (P genome = 0.00001) is located at nucleotide position 37,480,959 in LD block 3, which spans 185 kb in size. Linear and logistic regression analyses were performed to investigate the effect of the haplotypes within each window on disk calcification. The maximal number of haplotypes is 2 n , where n is the number of SNPs in a window, which mean that 16 haplotypes could be expected in a four-SNP window. However, with the dataset available and the high extent of LD the observed haplotypes for each of the nine haplotype windows ranged from two to four. The overall significance of which haplotype window explained more genetic variation than the other windows were assessed by the coefficient of determination (R 2 ), which provides a measure of how well the haplotype effects fitted in the model predicts the disease outcome (case/control) for a particular dog. In generalized linear model (GLM), residual mean deviance (RMD) was used as an indicator for variance explained by the haplotype window and thus the lower the RMD the better is the model fit. Looking at both the linear model and GLM all haplotype windows are significantly associated with disk calcification; see Table 4. Of the nine haplotype windows, we have identified haplotype window 3 as explaining the highest proportion of variance in the disk calcification dataset followed by haplotype window 1 and 2. Haplotype window 3 CFA12: 37,123,193-37,494,845 covers a part of LD block 2 and the entire LD block 3 identified in haploview. Test of association with disk calcification for particular haplotypes within the different haplotype windows, based on both the linear model and GLM are given in Table A2 in Appendix.
Based on these analysis we are able to set up a genetic predictions model for disk calcificationsŷ i in Dachshunds of the wirehaired variety given their haplotype or genotype information; where,ŷ i is the predicted disk calcification for individual i,α o is the intercept,Ŝ i is the estimated sex effect for the i th individual, andα î α i is the estimated effect for haplotype H ij for i th individual with haplotype J. Individuals with the least common haplotype were assigned the reference levelα o .

DISCUSSION
We have previously shown that the CFA12: 36,750,205-38,524,449 genomic region associates with disk calcification in wire-haired Dachshund on a genome-wide level (Mogensen et al., 2011). However, a comprehensive study of sequence variation within the region is required to identify the causal variant(s) that might explain the association signal. In this study we have investigated genetic variation within the target region through targeted resequencing in order to identify potential risk variants and validate original GWAS findings. To further investigate the locus we have identified LD block pattern across the disease-associated region and estimated the genetic variation explained by the different haplotype windows. Finally, we have developed a risk prediction model for wire-haired Dachshunds, using the disk calcification and haplotype dataset. Functional SNPs may have variable effect on protein sequence, transcriptional regulation, splicing, microRNA-and transcription factor binding sites depending on their position and flanking sequences. By targeted resequencing we have made a comprehensive list of potential causal variants that could explain the association signal. A ranking of these SNPs is necessary for followup studies to be possible. Numerous SNPs, identified in this study, are predicted to be located within transcription factor binding sites or microRNA-binding sites. Due to the high number of cases sharing the same haplotype we have focused on variants within protein coding regions or UTRs for which the case is homozygous. We have validated the association of one variant in the UTR   Table A4 in Appendix. The horizontal dotted line represents the threshold of genome-wide significance. The graphical representation of the LD pattern across the region is generated in Haploview 4.2. LD is specified using the r 2 -color scheme: r 2 = 0: white; 0 < r 2 < 1: shades of gray; r 2 = 1: black. The black horizontal lines in the Manhatten plot correspond to the position of the LD blocks defined in Haploview. of KCNQ5 and two synonymous variants in MB21D1 in an independent sample of wire-haired Dachshunds hereby confirming the original GWAS and thus providing further evidence for the association of this region with disk calcification. Disk herniation is also seen in long-and smooth-haired Dachshunds. However, interestingly, both cases and controls within these two hair variants appear to be fixed for the haplotype found in wire-haired cases. Thus, presumably other loci must be involved in the development of the disease in long-and smooth-haired variants. This hypothesis is supported by the fact that when 18 controls and 15 cases of long-and smooth-hair were included in our original GWAS (Mogensen et al., 2011), an additional locus, not appearing when including only wire-haired dogs, was detected on CFA3. However, more dogs are needed to confirm this hypothesis. In terms of SNPs validated in the wire-haired dogs any of the three variants may have a potential functional impact on the phenotype in wirehaired dogs. However, it is more likely that these SNPs are markers in high LD with the actual causal variant(s). Resequencing of the target region in a larger number of affected and unaffected dogs might be necessary to eliminate some of the identified variants before a thorough follow-up on other highly ranked variants can be carried out.

Frontiers in Genetics | Genetic Architecture
To characterize the CFA12 locus and potentially narrow down the candidate region we looked at the LD block pattern. Haploview identify seven LD blocks across the region associating with disk calcification. Several of the markers showing genome-wide significance are in strong LD (r 2 > 0.8) with genome-wide significant markers in other LD blocks indicating the presence of strong LD within the disease-associated region. That this genomic region falls into a segment of strong LD is further documented by 28 of the 33 cases in the GWAS sharing the same haplotype across all 36 genome-wide significant markers within this region (Mogensen et al., 2011). In addition several of the markers show more or less equivalent evidence of association for the given signal indicating that the markers are highly correlated. Given the high extent of LD within this region it is difficult to resolve whether two or more independent loci contribute independent effects to disk calcification.
Analyzing haplotype window effects could potentially pinpoint a haplotype window with a higher effect on disk calcification and thus define or narrow down the region of interest. By estimating the effect of the haplotype windows we have identified window 3 CFA12: 37,123,193-37,494,845 as explaining the largest part of the genetic variation between dogs in our dataset (76%) followed by haplotype window 1 and 2 explaining 73% of the genetic variation. From these results it therefore seems most likely that the causal genetic variant(s) are to be found within the CFA12: 36,750,205-37,494,845 genomic region, which harbors the ncR-NAs cfa-mir-30c-2 and cfa-mir-30a as well as a part of RIMS1. However, all haplotype windows explain a fair proportion of the variance in the dataset, which is not surprising due to the large amount of LD within this region. Therefore one needs to be careful when narrowing down the region to these three haplotype windows.
A genetic prediction model for intervertebral disk calcification based on these haplotype effects analyses may form a valuable tool for genetic counseling in the wire-haired Dachshund population.
Genome-wide association studies has to a large extent focused on the detection of effects attributable to common SNPs. Other sequence variants such as rarer variants (MAF of 1-5%) and structural variants are also expected to contribute to the genetic basis of common disease and efforts to detect these genetic variations should be included in future studies. Even when a true causal variant is identified challenges remain in reconstructing the molecular mechanisms whereby the variant have an impact on the phenotype of interest and even more work is necessary in translating these findings into advantages in clinical care. Based on a literature search no genes with a direct biological link is present within the disease-associated region one could speculate whether the region contains a regulatory element controlling the expression levels of a causal gene located either upstream or downstream of the candidate region identified here. One hypothesis is a regulatory variant affecting the expression level of COL9A1. This gene is located ∼1 Mb upstream of the disease-associated region and encodes one of the three alpha chains of collagen IX. Collagen IX serves as a minor component in the annulus fibrosus and the nucleus pulposus and is thought to be involved in maintaining network integrity in the normal disk. Mutations in COL9A2 and COL9A3 have previously been linked to human disk disease (Annunen et al., 1999;Paassilta et al., 2001) and studies in transgenic mice have further demonstrated that mutations in collagen IX can lead to disk degeneration but also degenerative joint disease (Kimura et al., 1996).

CONCLUSION
In the present study we validate the previously identified association of the locus CFA12: 36,750,205-38,524,449 with disk calcification in an independent sample of wire-haired Dachshund thus providing strong evidence that variation within this locus affect the development of disk calcification in wire-haired Dachshunds. Moreover, our results suggest that the locus falls within a region of strong LD hence complicating the identification of the causal variant. Our predictions on the effect of the nine different haplotype windows on disk calcification imply that the causal variant(s) are to be found within the CFA12: 36,750,494,845 genomic region, however care must be taken when drawing this conclusion as all haplotype windows explain a reasonable part of the variability in the disk calcification dataset.

ANIMALS AND DIAGNOSTIC PROCEDURES
This study was confined to Dachshund registered in the DDC. All blood samples included in this study were collected by licensed veterinarians with owners' consent. Inclusion criteria for sampling were based on radiographic examinations of intervertebral disk calcifications from the second cervical vertebra to the third sacral bone at age 24-42 months (Jensen and Ersbøll, 2000). Information regarding size (standard, miniature, and rabbit), hair variant (wire-haired, long-haired and smooth-haired) sex, age, and pedigree records were obtained from the Danish Kennel Club registry. Disease status of cases and controls were scored based on standard protocol for radiographic examinations; cases were classified as dogs with either ≥6 disk calcifications or dogs that had undergone surgical treatment for disk herniations. Controls were classified as dogs with ≤1 disk calcification. For further information on the distribution of disk calcifications among cases and controls (see Mogensen et al., 2011).

NimbleGen SEQUENCE CAPTURE ARRAY DESIGN AND DATA ANALYSES
For targeted resequencing one affected and one unaffected dog was selected. The affected dog had 12 disk calcifications as evaluated from the radiographic examination and was homozygous across the 36 significantly associated markers in the diseaseassociated region. The unaffected dog had no disk calcifications and was homozygous for the opposite alleles of the affected dog across the entire region. Both were female standard wirehaired dogs and unrelated at great grandparental level. A custom tiling NimbleGen 385K sequence capture array targeting CFA12: 36,702,574,449 on CanFam2.0 was designed and manufactured by Roche NimbleGen, Madison, WI, USA. The probe set design was approved with the fraction of bases in the target region covered by probes being 96.5%. Genomic DNA was captured following the NimbleGen Sequence Capture protocol (Roche NimbleGen, Madison, WI, USA). In brief, 25 µg genomic DNA was fragmentized by sonication to blunt-ended fragments and hybridized to the custom array. Unbound fragments were www.frontiersin.org washed away. The target-enriched pool was eluted and recovered from the array and amplified by ligation-mediated PCR. Quantitative fluorescence PCR (qPCR) was performed on pre-and post-enriched libraries to calculate relative-fold enrichment of the targeted region. A locus within the target region was selected for qPCR enrichment analysis with the Stratagene Mx3000P qPCR system using the following primers designed using Primer-BLAST (Primer BLAST) 2 : F: 5 -TGCCTCTGTTGTCCACAGTCAGA-3 ; R: 5 -TGCTTGGGGACCTCCTGTCACC-3 . One microgram of captured libraries were subsequently sequenced on the Illumina Genome Analyzer platform as paired end 2 × 36 sequencing reads following the Genome Analyzer User Guide. Bowtie (Langmead et al., 2009) was used to align short read sequence data against the CanFam2.0 reference genome and sequence variants were identified running MAQ (Li et al., 2008) on the reads aligning uniquely to the region.
All SNPs identified from resequencing were evaluated according to their potential functional effect on disk calcification. The SNPs were compared to Ensembl Canis familiaris version 64.2 annotations and predictions and SNPs in protein coding regions or within or near predicted ncRNAs were identified. Further SNPs were evaluated based on a measure of conservation in dog, human, mouse, and rat, position according to transcription start site and end site and if the SNP was likely to change the predicted binding of transcription factors or predicted ncRNAs.

VALIDATION OF GWAS FINDINGS USING TAQMAN ® SNP GENOTYPING ASSAYS
Three SNPs at nucleotide position 37,871,992, 38,513,135 and 38,514,745 were genotyped using Custom TaqMan® SNP Genotyping assays (Applied Biosystems, Foster City, CA, USA) in an independent sample of wire-haired dogs not included in the original GWAS. The sample included 56 controls and 28 cases that had undergone a thorough radiographic examination to determine affection status as descried previously. The primers and probes obtained from the ABI assay kit are specified in Table A3 in Appendix. Reactions were carried out according to the manufacturer's protocol. Briefly, PCR was performed in the presence of 10 ng genomic DNA, TaqMan® Universal PCR Master Mix, and the SNP Genotyping Assay specific for each SNP. The thermal cycling conditions on Mx3000P™ (Strategene) were 95˚C for 10 min, followed by 60 cycles at 92˚C for 15 s and 60˚C for 1 min. Results were analyzed using the MxPro software and the SNPs were tested for genotypic associations with disk calcification using chi-square test statistics.

ANALYSIS OF LD PATTERN IN HAPLOVIEW
The LD pattern of all 117 SNPs covering the CFA12: 36,750,205-38,524,449 genomic region were analyzed in Haploview 4.2 ( Barrett et al., 2005) using SNP genotyping data from the original GWAS with 33 wire-haired cases and 28 wire-haired controls. The four gamete test (Wang et al., 2002) implemented in Haploview using default parameters were used to define the LD block structure and create a graphical representation of the LD pattern. The level of LD is represented by r 2 -values. 2 http://ncbi.nlm.nih.gov/tools/primer-blast

ESTIMATION OF HAPLOTYPE EFFECTS ON DISK CALCIFICATION
The effect of haplotypes in nine haplotype windows was estimated using data from our previous GWAS on disk calcification (Mogensen et al., 2011). The 30 cases and 23 controls included in the analyses were all of standard size and wire-haired to keep the population as genetically homogeneous as possible. For the 36 genome-wide significant markers within the CFA12: 36,750,524,449 genomic region we defined haplotype windows with four-SNPs creating nine haplotype windows, see Table A4 in Appendix. The haplotype frequencies and most likely haplotype pair (linkage phase) for each dog were estimated from genotyping data using PHASE v.2.1.1 (Stephens et al., 2001). Since haplotypes are reconstructed from genotype data, there are always two haplotypes per dog for each haplotype window. From the PHASE data each dog was assigned a score of 0, 1, or 2 corresponding to 0 copies, 1 copy, or 2 copies of a given haplotype in a haplotype window. Using this haplotype count data, we estimated the effect of each window on disk calcification in dogs. Preparation of data files and methods used for estimating haplotype substitution effects were according to those described for allele substitution models by Kadarmideen et al. (2011) and Kadarmideen (2008). Estimations of haplotype effects on disk calcification was done on a binary scale as cases = 1 (classified as dogs with ≥6 disk calcifications) and controls = 0 (classified as dogs with 0 or 1 disk calcification). Information on sex was included as fixed effects. All analyses were performed in ASReml 3.0 (Gilmour et al., 2002). Linear and logistic regression models were fitted to binary case/control scores on disk calcification. A standard linear haplotype substitution model was: where, for individual i, α 0 is the intercept, S i is the sex, and ε i is the residual. The term cH iJ is the number of copies (0, 1, or 2) of haplotype J (1 to p). The least common haplotype was set as a reference level (=α 0 ) and the effect of the other haplotypes represents the relative haplotype effect compared to this reference level.
To take the binomial distribution of case/control data we fitted a GLM using the logit link function. The model took the following form: where π i is the probability of observing a case y i = 11 − π i is the of probability of observing a control y i = 0. All analyses were conducted for each haplotype window one at a time. Significance of the model terms was assessed by F -test statistics and associated p-values for each haplotype in each haplotype window and other fixed effects. For the linear model (2), the overall model fit for a particular haplotype window was assessed by R 2 values expressed as percentage. This explains the proportion of variance in disk calcification explained by the corresponding haplotype window. Since there is no equivalent expression for R 2 in the GLM framework, the logistic model fit was assessed by the RMD. The RMD represent residual effects not explained by the model; hence the lower the RMD the better is the model fit. For both the linear model and Frontiers in Genetics | Genetic Architecture GLM, the overall statistical significance was assessed by p-values.
It should be noted that linear models (2) were applied to binary case/control data as if they were normally distributed. It has been shown that linear models are quite robust to violation of normality in gene or QTL mapping and association studies and that it is simple to apply and interpret in many studies (Kadarmideen et al., 2000). However, we also applied statistically appropriate GLM to case/control binary data.

ACKNOWLEDGMENTS
This work was supported by the European Commission (LUPA-GA-201370); and a Faculty of Life Sciences, University of Copenhagen PhD stipend. The authors thank Olav Nørgaard and Majbritt Hansen for blood sampling. In addition we thank the Danish Dachshund Club and dog owners for contributing and supporting this study.