Distribution and Functionality of Copy Number Variation across European Cattle Populations

Copy number variation (CNV), which is characterized by large-scale losses or gains of DNA fragments, contributes significantly to genetic and phenotypic variation. Assessing CNV across different European cattle populations might reveal genetic changes responsible for phenotypic differences, which have accumulated throughout the domestication history of cattle as consequences of evolutionary forces that act upon them. To explore pattern of CNVs across European cattle, we genotyped 149 individuals, that represent different European regions, using the Illumina Bovine HD Genotyping array. A total of 9,944 autosomal CNVs were identified in 149 samples using a Hidden Markov Model (HMM) as employed in PennCNV. Animals originating from several breeds of British Isles, and Balkan and Italian regions, on average, displayed higher abundance of CNV counts than Dutch or Alpine animals. A total of 923 CNV regions (CNVRs) were identified by aggregating CNVs overlapping in at least two animals. The hierarchical clustering of CNVRs indicated low differentiation and sharing of high-frequency CNVRs between European cattle populations. Various CNVRs identified in the present study overlapped with olfactory receptor genes and genes related to immune system. In addition, we also detected a CNV overlapping the Kit gene in English longhorn cattle which has previously been associated with color-sidedness. To conclude, we provide a comprehensive overview of CNV distribution in genome of European cattle. Our results indicate an important role of purifying selection and genomic drift in shaping CNV diversity that exists between different European cattle populations.


INTRODUCTION
Copy number variation (CNV), which is defined as large-scale losses and gains of DNA fragments, forms one of the major classes of genetic variation (Zhang et al., 2009). In terms of total bases involved, CNV affects a larger fraction of genome compared to Single Nucleotide Polymorphisms (SNP) (Redon et al., 2006). In addition, estimates from several autosomal dominant diseases have indicated a higher de novo locus-specific mutation rate for CNV than for SNP (Lupski, 2007a). Copy number losses and gains of genetic sequences roughly make up 5-10% of the human genome, and many of the regions affected are associated with phenotypic variations including susceptibility to specific diseases (Redon et al., 2006;Stankiewicz and Lupski, 2010;Zarrei et al., 2015). For example, the presence of lower CCL3L1 gene copy number compared to the population average is associated with contracting HIV and developing AIDS (Gonzalez et al., 2005). Differences in copy number of genomic segments can result in changes in gene expression and phenotypic variation through gene disruption and altering gene dosage. For example, duplication of the APQ7 gene in human has been linked with the emergence of traits related to endurance running as a consequence of increased expression (Dumas et al., 2007;Lupski, 2007b).
A number of studies have been undertaken in various domestic animals to characterize CNV and their effects on phenotypes (Chen et al., 2012;Liu et al., 2013;Bickhart et al., 2016;Zhu et al., 2016). For example, duplication of a set of fibroblast growth factor (FGF) genes and the ORAOV1 gene in Rhodesian and Thai Ridgeback dogs causes a characteristic dorsal hair ridge (Salmon Hillbertz et al., 2007). A partial or complete duplication of the KIT gene causes different patterns of white coat coloration in pigs and in some of the cattle breeds such as White park, Galloway and Belgian blue (Pielberg et al., 2002;Durkin et al., 2012;Brenig et al., 2013). Similarly, white coat color in sheep has been associated with a duplication of the ASIP gene (Norris and Whan, 2008). Because CNV is known to be common in regions of the genome that regulate important physiological functions, they have also been studied for association with economically important traits in domesticated animals, such as milk production and fertility. For example, several CNVs associated with milk production traits in Holstein Friesian (HF) cattle have been identified .
The aurochs (Bos primigenius primigenius) is the ancestor of European cattle. Although the wild ancestor no longer exists, many extant European cattle breeds still possess primitive, aurochs-like features. These breeds are often referred to as primitive cattle breeds (Upadhyay et al., 2016). By contrast, commercial cattle breeds, including Holstein-Friesian (HF), Brown Swiss (BS) and Jersey, display derived phenotypic traits such as polledness and early maturity. It is likely that some of these differences in traits between primitive and modern cattle may result from CNV. Systematically assessing CNV between commercial and primitive cattle breeds might, therefore, reveal the genetic changes responsible for phenotypic differences, which have accumulated throughout the domestication history of cattle as result of natural and artificial selection. Moreover, contrasting cattle populations from different European regions may provide insight into the role of CNVs in population differentiation. Also, as previous studies (Park et al., 2015;Upadhyay et al., 2016) have indicated a higher frequency of aurochs specific alleles in North-western European cattle breeds compared to Balkan and Italian cattle breeds probably as a result of secondary contact between local aurochs and ancestor of domestic cattle, it is possible that comparing CNV patterns between cattle of different regions may reveal unique CNV that might have been introgressed into the ancestors of present domestic European cattle during this secondary contact. Hence, the objectives of the current study are to provide comprehensive overview of CNV distribution in various European cattle populations and to carry out comparative evaluation of CNV patterns between cattle populations originating from different regions of Europe and thus, to address the role of CNVs in population differentiation.

Sample Collection and Genotyping
A total of 196 animals from 38 different cattle breeds, consisting mainly of primitive cattle breeds (27 breeds), were sampled. Numbers of animals per breed varied from 1 to 6 (Supplementary Table S1), except for Holstein-Friesian (HF), for which 55 animals were used to identify CNVs. The geographic origin of 189 animals was assigned to one of five regions or breed groups: British and Irish (BRI), Dutch (NLD, ancestral to large parts of the Lowland Pied and Baltic Red cattle), Balkan and Italian (BAI, representing Podolian and Busha cattle), Iberian (IBR), and Alpine (ALP, combining the Central Brown and Spotted breed cluster). Because five Heck (HE) cattle did not belong to any of these groups and geographic origin of two samples were not assigned confidently they were not categorized in any of these five groups.
DNA was extracted from hair roots, blood or semen. Hair and blood samples were collected by a veterinarian in accordance with EU legislation. Semen samples were obtained from commercial AI services. The research did not involve experimentation on animals requiring approval of Animal Experiments Committee (DEC), Netherlands. The samples were genotyped with the Illumina BovineHD Genotyping BeadChip, which contains 777,692 SNPs uniformly spanning the bovine genome. All the SNP were clustered and analyzed using the Illumina BEADSTUDIO software (2.0).

Identification of CNVs
We used the PennCNV software to identify CNVs (Wang et al., 2007). A Hidden Markov Model (HMM) algorithm as employed in PennCNV incorporates multiple parameters, such as total signal intensity (LRR), allelic intensity ratio (BAF) values of each marker for each individual, and the population frequency of B allele (PFB) of SNPs. Both the LRR and BAF values of each marker for each sample were generated from the Illumina Genome Studio software package using the default clustering file (Illumina Inc, United States). The PFB was calculated as the average BAF for each marker in this population. We only used autosomal markers for detection of CNVs. The chromosomal positions of the SNPs were derived from the bovine UMD3.1 genome sequence assembly. To reduce the number of false positives, the LRR of each SNP was adjusted for the GC content of 1 Mb window surrounding the SNP using the '-gcmodel' option as employed in PennCNV. The PennCNV algorithm (with options: -test) was applied to all 29 autosomes (with option: -lastchr 29) to detect cattle CNV. All samples with standard deviation of LRR > 0.30, standard deviation of BAF > 0.001 and wave factor > 0.05 were discarded for downstream analysis. Finally, 47 low quality samples were discarded from further analysis. Of the remaining samples (Supplementary Table S1), a CNV was included in the downstream analysis if it spanned minimum of 3 SNPs (default). Furthermore, a CNV region (CNVR) was defined as a union of overlapping CNVs detected in two different samples (Redon et al., 2006). The identification of CNVRs was performed using a custom python script.

Comparison of Cumulative CNV Counts and CNV Size
To compare differences in CNV counts and CNV size between the five major breed groups, we first removed the samples showing outlier values (mean ± 3 standard deviations) for either CNV counts, or CNV size, or both, if more than five samples represented a breed. In the second step, we used the Kruskal-Wallis test to assess overall differentiation of cumulative CNV counts, and One Way ANOVA to assess overall differentiation of cumulative CNV size among the five breed groups of cattle. If the overall P-value was significant (P < 0.05), we performed a post hoc Mann-Whitney test for cumulative CNV counts and a t-test for cumulative CNV size to assess pairwise population differences followed by Bonferroni correction for multiple testing.

qPCR Validation
Copy number variations were validated by Real-time qPCR using the 7500 Fast and RT (Real-time) PCR system (Applied Biosystems). Primers were designed using the Primer3 webtool 1 . Information about Primers and samples used in qPCR are given in Supplementary Table S2. All PCR primers were designed from UMD 3.1 reference genome based on the first 1000 bp regions of CNVs. Before PCR, the quality and quantity of every DNA sample was measured with Qubit R 2.0 Fluorometer. PCR amplifications were performed in a total volume of 12.5 µL containing reagents described in Supplementary Table S2. The BTF3 gene was chosen as internal standard in all qPCR experiments.

Hierarchical Clustering of CNVR Data
To cluster samples according to their CNV similarities, we made a vector of "0"s and "1"s for each individual based on presence or absence of a specific CNVR in that particular individual. A hierarchical clustering was performed using the DendroUPGMA. 2 We used Jaccard index as a distance measure 1 http://frodo.wi.mit.edu/primer3/ 2 http://genomes.urv.cat/UPGMA/ and the unweighted pair-group method with average mean (UPGMA) as the clustering method.

Identification of Segmental Duplications
We applied Whole Genome Assembly Comparison (WGAC) to identify intrachromosomal segmental duplications as described previously (Khaja et al., 2006). In the first step, we retrieved the Repeat-masked UMD 3.1 cattle genome assembly from the UCSC database 3 . Subsequently, we used MegaBlast to perform sequence similarity searches within the assembly. Finally, we retrieved all paralogous sequences which displayed >90% similarity and with a size of >5 Kb.

Assessment of Population Differentiation Using V st
To detect population differentiated CNVs between cattle populations of different regions, V st was calculated. We calculated pair-wise V st as defined previously (Redon et al., 2006) by using the equation: (V s −V T )/ V T , where V T is the total variance in mean of LRR of a probe across all individuals of two populations and V s is the average variance of a probe in samples within each breed. We used a window-based approach to identify groups of minimally 3 SNP probes, each showing significant V st (V st > 0.35) with the window shift of a single SNP probe. Finally, we only referred to a CNV as population differentiated if it contained the group of SNPs identified as having significant V st in this window based approach.

Gene Contents and Functional Annotation
The unique cattle gene list based on UMD 3.1 was retrieved from Ensembl biomart (Cow release 84). The PANTHER classification system was used to assess the probability of overrepresented genes in CNVRs within biological process, cellular component, and molecular function using Bonferroni correction for multiple comparisons (Mi et al., 2005).

Overview of Copy Number Variation (CNV) across All Groups
A total of 9,944 autosomal CNVs were identified in 149 European cattle samples (Supplementary Table S3). Out of 9,944 CNVs, 1,941 were identified as singletons (Supplementary Table S4), while the remaining CNVs had minimally a base overlap with at least one CNV identified in another sample. The average number of CNV identified per sample was 67. For the different breed groups, BAI, BRI, IBR, NLD, ALP, the average number of CNV per sample was 80, 79, 70, 58, and 55 respectively. We found overall significant differences (P < 0.05, Kruskal-Wallis test) as well as pair-wise population differences (P < 0.05, post hoc Mann-Whitney test) in the average number of identified CNV per individual in each of the five major breed groups ( Figure 1A). Excluding singletons from the comparison also resulted in significant differences in average number of CNVs between the major breed groups ( Figure 1B). On the other hand, despite overall significant differences (P < 0.05, One way ANOVA) in average cumulative length of CNVs per individual in each of the five major breed groups, pair-wise post hoc t-test did not result in a significant difference between any of these groups.

CNV Validation Using qPCR
To validate CNVs identified in the present study, we performed quantitative PCR (qPCR) assays for 18 CNV loci in 33 animals. These loci were chosen to represent different copy number states (loss, gain and complex) and different frequency ranges (from singleton to multiple individuals). Of the 18 CNVs tested, 13 could be confirmed by qPCR. The copy number estimated by qPCR for all confirmed CNVs, agreed with the state estimated by the PennCNV analysis. Of the 5 nonvalidated CNVs, 3 were identified only in one animal, while the remaining 2 were identified in at least two animals. Only 1 of the 5 non-validated CNVs did not amplify because of the poor DNA quality of samples, while non-amplification of the remaining CNVs indicated normal copy instead of hemizygous deletion as identified by PennCNV for all locus. These results also indicate high likelihood of singleton CNVs, i.e., CNVs that occurs only once in the dataset, being false positive (Supplementary Table S2).

Overview of CNVRs
A total of 923 CNVRs were identified by aggregating overlapping CNVs with overlap identified in at least two animals (Figure 2,  Supplementary Table S5). These CNVRs cover 61.06 Mb of the cattle UMD 3.1 genome assembly, which corresponds to ∼2.5% of the 29 bovine autosomes. However, the distribution of CNVRs across all autosomes varies considerably, with the highest number (55) on chromosome 1 and the lowest (10) on chromosome 27. The estimated length of CNVRs varies from 1.53 Kb to 3.51 Mb, with an average of 66.15 Kb ( Figure 3A). The ratio of total estimated CNVR length per chromosome to the length of that chromosome varies from 8.20% for chromosome 12-0.040% for chromosome 26. Chromosome 23 displays the highest density of CNVRs with an average distance of 1.46 Mb between CNVRs, while chromosome 20 exhibits the lowest density of CNVRs with an average distance of 5.24 Mb between CNVRs. These 923 CNVRs are comprised of 587 losses, 179 gains and 157 complex (both loss and gain) events. Furthermore, the frequency of these CNVRs in the populations under study ranged from 1.34% (present in 2 of the 149 animals) to 97.31% (present in 145 of the 149 animals). The 157 complex CNVRs, on average, displays much higher frequencies (∼13%) than the average frequency of only losses (∼3.6%) or only gains (∼3.4%) CNVR events. One complex CNVR (id: CNVr527) displays the highest frequency (97.31%) and is located on chromosome 12 between 73.2 and 76.7 Mbp. Out of all 923 CNVRs, 198 CNVRs (21%) have a frequency of more than 5% in the studied cattle populations (Supplementary Table S5). Of these 198 CNVRs, 49 were identified in at least one individual of each of the five breed groups (Supplementary Table S5). The BRI and NLD populations revealed the highest (26) and the lowest (8) number of CNVRs per sample respectively ( Figure 3B).

Hierarchical Clustering Based on CNVRs
To assess population differentiation, hierarchical clustering was carried out after converting CNVRs into binary data (see Materials and Methods). Samples belonging to low diversity breeds, as identified in our previous SNP based studies FIGURE 2 | Distribution, status and frequency of CNVR in the bovine genome (UMD 3.1). The status of CNVRs are shown in outer circle in Red (loss), Green (gain) and Blue (both), while the inner circle is indicative of the frequency. (Upadhyay et al., 2016), such as English longhorn (EL) and Boskarin (BK), cluster together (Supplementary Figure S1). However, samples belonging to IBR breeds show a lower degree of differentiation compared to samples belonging to the other regions. Also, the breeds from the same geographic region do not cluster together. For example, BS (an Alpine breed) samples form a clade with the Iberian samples belonging to Pajuna (PA), indicating sharing of high frequency CNVRs.

Association between High Frequency CNVRs and Intra-Chromosomal Segmental Duplications (SD)
The large proportion (>21% of total) of CNVRs displaying a frequency above 5% indicates that either; (1) they fall into regions of CNV hot spots in the cattle genome, (2) these are likely to be of more ancient origin when compared to the low frequency CNVRs or (3) they are under strong positive selection. CNV hot spot regions in the genome usually overlap with genomic segmental duplications (SDs) (Sharp et al., 2005). Thus, to assess the correlation between SDs and high frequent CNVRs, we identified SDs from the UMD 3.1 bovine genome assembly. In the analysis, we only considered intra-chromosomal SDs longer than 5 Kb, as these SDs are more likely to cause misalignment and aberrant recombination (Stankiewicz and Lupski, 2002) than the small size SDs (<5 Kb). Of the 21 CNVRs that overlapped with large SDs, 14 were present at a frequency of more than 5% across all cattle populations (Supplementary Table S6).

Assessment of Population Differentiated CNVs Based on V st
To identify CNVs contributing to population differentiation, we calculated the pairwise V st between every possible combination  of the five major breed groups, and between, HF and the four major breed groups. In the latter V st analysis comparing HF samples with other breed groups we did not include NLD, as large proportion of its sample size consisted of HF samples. The value of V st varies between 0 (no population differentiation) and 1 (complete population differentiation), with high V st values suggesting a difference between populations in the frequencies of copy number states of underlying sequences. Interestingly, we observed a very few breed group differentiated CNVs for all combinations except for HF vs. BRI, where we observed quite a few breed group differentiated CNVs. We also considered the effect of mis-assembly on lineage differentiated CNVs as it has been reported previously that incorrect placement of the sequence from the sex chromosome on autosomes may distort the dosage ratio between male and female which can lead to false positive lineage population differentiated CNVs . Clearly, except for three CNVRs, all CNVRs identified as breed group differentiated (in HF vs BRI) displayed difference in LRR values between bull and cow samples (Supplementary Figure  S2 and Table S7). Thus, the high V st observed between BRI and HF for CNVs involving miss-assembled SNPs can be attributed to the highest proportion of female samples in the BRI group, while all HF samples were male.

Gene Content of CNVRs
Of the 923 CNVRs identified in the present study, 415 (∼45%) span (with at least one bp overlap) 916 unique cattle genes (Supplementary Table S8). Interestingly, genes from certain immune response related gene families such as the TRAV like gene family, and IGL were covered by complex CNVRs indicating differences in copy number between different cattle breeds. The most frequent CNVR (id:CNVr527) covered four related genes, which are located in tandem and display similarity to ABCC4 in human. We also found genes related to economically important traits of livestock covered by CNVRs such as MTHFSD, and GTF2I. In addition, CNVRs also covered some essential genes such as MSH4 and ATF2 (Supplementary Table S9). The gene ontology (GO) analysis for 916 ensemble unique genes using the Panther classification system (REF) showed that terms related to immunity and olfactory activity were overrepresented in the CNVRs that were identified in the present study (Supplementary  Table S10).
Interestingly, 5 cattle samples displayed a CNVR (id: CNVr2375) covering the KIT gene. However, occasionally, CNVs contributing to a CNVR may or may not fall in the gene covered by that CNVR. Hence, we investigated the CNVs within CNVr 2375. Detailed analysis of these CNVs revealed that of the 5 samples, 3 EL samples had a CNV gain covering the KIT gene, while the remaining samples had CNVs outside the gene. In addition, the estimated size and position of this CNV within CNVr 2375 ( Figure 4A) displayed similarity with the genomic segment involved in a serial translocation from chromosome 6 to chromosome 29 that was previously reported in Belgian Blue, Galloway and White Park cattle breeds (Durkin et al., 2012;Brenig et al., 2013). Thus, to test the presence of that same serial translocation overlapping the KIT gene, we PCR amplified the known fusion points of the translocation (for more details refer to Durkin et al., 2012). The amplification of fusion points (α-D,E-A and C-β) confirm the presence of the Belgian Blue type allele (Cs29) in 2 EL samples (Figure 4B), whereas the remaining sample did not amplify due to poor DNA quality. The White Park cattle samples, which were discarded from the analysis due to high standard deviation in LRR, also reveal the presence of the Cs29 allele. These results show a high prevalence of the Cs29 allele in BRI cattle breeds.

Difference in Abundance of CNV Counts between Different Cattle Populations
We found significant differences in CNV counts between different cattle populations. The BAI and the BRI breed groups displayed relatively high number of CNVs per individual compared to the breed groups from other regions. The mean CNV cumulative length per individual between different cattle groups, however, was not significantly different. Such difference in CNV abundance between different cattle populations have already been reported previously. For instance, high CNV abundance has been reported in indicine and African taurine cattle breeds than in European taurine, which has been attributed to their breed divergence and population history . Similar observations were also reported in other species as well. For instance, Pezer et al. (2015) reported difference in total CNV abundance and total genic deletions between several natural populations of the house mouse, which they attributed to difference in effective population size between mouse populations. It is evident from these studies that population history such as change in past effective population size, gene flow, and selection process may contribute to differential CNV abundance between different populations. Thus, we hypothesize that persistence of small effective population size over many generations in BRI breeds such as EL and Highland (HL) may have resulted in relaxation on purifying selection against slightly deleterious CNVs and which in turn, may have resulted in accumulation of large number of CNVs and genic deletion events. To test this hypothesis, we calculated the number of genic deletion CNVs, percentage of genic CNVs as well as cumulative genic CNV length in breeds with more than or equal to 3 samples. (Figure 5 and Supplementary Figure S3). Indeed, we observed a higher number of genic deletion as well as higher cumulative length of genome under deletion in BRI breeds [English Longhorn (EL) and Highland (HL) (Highland)] compared to breeds from other regions. This observation supports the hypothesis that genetically isolated small populations may accumulate abundance of CNVs, in particular, deletion CNVs. However, we note that, since some SNP arrays display bias toward detection of deletions (Pinto et al., 2011) and the present study suffers from low sample size per breed, large samples from multiple breeds are needed to be sampled to explore this hypothesis further.
On the other hand, within population variability in number of genic and total number of CNVs in several BAI and IBR breeds may be attributed to high admixture pattern of their genomes (Decker et al., 2014;Upadhyay et al., 2016) or higher historical effective population size compared to NLD, ALP or BRI breed-groups.

Comparison of Identified CNVRs with Previous Studies
To characterize the CNVRs identified in the present study in more detail, we compared them to the CNVRs identified in eighteen previous studies using various methods such as whole genome sequencing (Stothard et al., 2011;Boussaha et al., 2015;Bickhart et al., 2016;Ben Sassi et al., 2016), comparative genomic hybridization (Fadista et al., 2010;Liu et al., 2010;Kijas et al., 2011;Zhang L. et al., 2014;Zhang et al., 2015), Illumina Bovine HD BeadChip Arrays Jiang et al., 2013;Zhang Q. et al., 2014;Sasaki et al., 2016), and Illumina Bovine 50K SNP array (Bae et al., 2010;Hou et al., 2011;Jiang et al., 2012). The comparisons revealed that 737 (80% of the total) number of CNVRs detected in the present study overlapped completely or partially (by at least a single bp overlap) with CNVRs from these previous studies detected in literature ( Table 1). The inconsistency of overlaps with different studies can be attributed to differences in size and structure of populations under investigation, platforms and algorithms of CNV calling, definitions of CNV and CNVR between various studies. As expected, high overlaps are reported in studies that have investigated CNVs in diverse cattle breeds using the Illumina Bovine HD BeadChip array ( Table 1).
The identification of 186 Novel CNVRs suggests that a substantial number of CNVs in cattle genomes are yet to be identified (Supplementary Table S11). Of the 186 CNVRs (∼20% of the total) identified as novel in this study, 49 are breed specific CNVRs for various breeds. Of the 49 private CNVRs, 27 are observed only in HF. We note that HF might have shown the highest percentage of breed specific CNVRs due to the larger sample size investigated in our study. However, despite the small sample size, all BRI breeds displayed at least one breed specific CNVR. Interestingly, the EL displayed quite a few breed specific private CNVRs (6) followed by Heck (HE), which displayed 4 breed specific CNVRs.

Sharing of Highly Frequent CNVRs and Low Differentiation between European Cattle Populations
Hierarchical clustering of CNVRs revealed that animals which belonged to breeds with a relatively low diversity such as EL, Boskarin (BK), Dutch Friesian (DF), Maltese (MT), and Heck (HE) formed a clear cluster. This type of a clustering pattern suggests that low diversity led to an increase in shared CNVRs between animals. For example, the Boskarin (BK) breed displayed more than 40% of the total CNVRs as shared between two or more samples (data not shown). However, unlike SNP based clustering of these same samples in our previous study (Upadhyay et al., 2016), samples from the same region did not cluster together (Supplementary Figure S1). This indicates a relatively low level of differentiation or sharing of high frequent CNVRs among different European cattle breeds. This discordance in inference of European cattle population structure indicates that either our analysis suffered from low sample size per breed or most CNVs are transient enough to not have followed the same pattern of demographic events in the history of cattle domestication that typical neutral genetic variants have experienced. In addition, it can be speculated that de-novo CNVs, CNV hot-spot regions in the genome and false CNV calls due to variation in genotyping intensities can also affect inference of population stratification.
To investigate the highly frequent CNVRs in more detail, we performed their association with SDs. It has already been shown that SDs provide substrate for non-allelic homologus recombination (NAHR), which in turn, produces novel chromosomal rearrangements and copy number changes. Therefore, CNVRs that overlap with SDs typically display high frequencies as compared to the CNVRs that do not overlap SDs. Accordingly, we found an enrichment of highly frequent CNVRs in cattle SDs as previously observed in human, mice, and apes (Redon et al., 2006;Gazave et al., 2011;Pezer et al., 2015).
Recently, Sudmant et al. (2015), while analyzing CNV patterns across different human populations and a Denisovan sample, identified large duplications that introgressed from the extinct Denisovan lineage exclusively into Oceanic population, and also were present at high frequencies. Since north-western European cattle breeds harbor high frequency of aurochs' specific alleles, probably as a result of secondary aurochs introgression (Park et al., 2015;Upadhyay et al., 2016), we investigated whether animals from these regions carry any unique CNVs. However, V st based analysis did not identify any region specific CNV that might have introgressed from aurochs during secondary contact, i.e., CNV present only in animals of certain regions as a result of aurochs introgression. In the future, the availability of high-coverage sequence from archaic aurochs samples might aid researchers in identification of ancient CNVs in the genome of European cattle. Additionally, V st based analysis also did not identify any breed-group differential CNVs, when contrasting HF (commercial breed) against IBR or BAI animals. This observation is consistent with a recent study on bovine population structure, where authors reported only few lineage-specific CNVs in breeds from the same continent, i.e., Holstein and Angus cattle breeds . However, quite a few breed-group differentiated CNVs between HF and BRI were identified, except a few, all of which turned out to be false positive CNVs.

Copy Number Variable Genes
Cattle genomes display enrichment of CNVs in genes related to immune response and environmental interaction such as sensory perceptions of smell and chemical stimuli. Many of these immune related genes appeared to be copy number variable between different cattle breeds. These variations may explain inter-population differences in immunological response to different clinical conditions. For example, BoLA-DRB3 locus which partially lies within a high frequency complex CNVR (id: CNVr1586, Supplementary Table S8) has been associated with differential response to various clinical conditions such as Mastitis and Bovine leukemia virus infection in various cattle breeds (Rupp et al., 2007;Yoshida et al., 2012). Another interesting example is the CIITA gene, which lies within a high frequency gain CNVR identified in multiple breeds (id: CNVr1680, Supplementary Table S8) and which was found to be duplicated in Angus cattle that showed nematode resistance . On the other hand, enrichment of CNVs in genes related to sensory perception of smell is either indicative of physiological requirements of domestic animals as described previously in case of pig (Paudel et al., 2013) or can, alternatively, be the result of drift due to random duplication and deletion of olfactory genes (OR) (Nozawa and Nei, 2007). Similar overrepresentation of CNVs in immune related genes and OR genes was reported previously in various species of domestic animals (Paudel et al., 2013;Liu et al., 2013;da Silva et al., 2016). The most frequent CNVR displayed variable copy numbers and covered genes similar to ABCC4 in human. The ABCC4 genes encode a protein related to ATP-binding cassette (ABC) transporters which transport various molecules across extraand intra-cellular membranes. Previous studies have reported association between ABCC4 genes and phenotypic traits related to disease resistance and feed efficiency in cattle Chen et al., 2012). Interestingly, Lee et al. (2013), using whole genome sequencing data of Hanwoo cattle, reported a very high number of non-synonymous SNPs, splice-site variants, and coding indels in ABCC4 gene. These observations imply that either ABCC4 gene has evolved into multiple copies for environmental adaptation, or, alternatively, mis-assembly at chromosome 12 led to distortion of signal intensity ratio resulting in detection of false CNVRs. The CNVR1206 that covered the MTHFSD gene, has been associated with milk protein yield in Spanish HF cattle (Ben Sassi et al., 2016), while the GTF2I gene that lies within CNVR1703, has been identified as a candidate gene related to traits associated with feed conversion efficiency in chicken (Reyer et al., 2015). Novel CNVRs, i.e., CNVRs that were identified only in the present study, spanned important genes such as MSH4 and ATF2 etc. The MSH4 gene encodes a protein essential for reciprocal recombination and proper segregation of homologous chromosomes at meiosis. Additionally, deficiency of MSH4 gene, which is covered by CNVr1975 (Supplementary  Table S8) and identified only in two animals, has been associated with impaired gamete formations in laboratory mice (Kneitz et al., 2000). Recently, Ma et al. (2015) and Kadri et al. (2016) also have associated the MSH4 gene with recombination rate in cattle. Also, deficiency of the ATF2 gene, which is partially covered by CNVr1217 and identified only in four animals, led to early postnatal lethality in laboratory mice (Bhoumik and Ronai, 2008). Since both these CNVRs (id: CNVr1975 and CNVr1217) have been identified in low frequency and in heterozygous deletion state, the hypothesis of purifying selection against such deleterious CNVs cannot be ruled out.
Recently, Durkin et al. (2012) have shown that a duplication of a KIT gene segment from chromosome 6 and its aberrant insertion on chromosome 29 led to the "color-sided" white coat color phenotype in Belgian blue cattle. Additionally, Brenig et al. (2013) identified the Belgian-blue type allele (Cs29) in White Park and Galloway cattle. Interestingly, they also suggested a dose dependent effect of Cs29 in which heterozygous (Cs29/wild allele) animals exhibited variable degrees of pigmented spots on white body trunk and homozygous (Cs29/Cs29) animals produced no pigmentation on the body trunk. In our study, we show the presence of the Belgian Blue type allele in EL cattle, most likely introduced in the EL due to cross breeding with other cattle breeds such as Galloway and White park that also carry the same allele. In addition, as both EL animals were homozygous for Cs29 and as this breed harbors low diversity, we hypothesize that frequency of Cs29 allele is high in this breed.
In summary, we utilized signal intensity data from Bovine Illumina HD genotyping array to identify CNV in cattle populations sampled from different regions of Europe. The comparative evaluation indicated a higher abundance of CNV counts in British and Balkan-Italian cattle breeds, probably because of high historical effective population size or relaxation on purifying selection of slightly deleterious CNVs. Also, clustering based on CNV regions displayed low population differentiation indicating the effect of transient CNVs or CNV hot-spot region. Functional analysis revealed enrichment of CNVR in genes related to immunological responses and environmental interaction such as sensory perceptions of smell and chemical stimuli. In addition, we also detected a CNV overlapping the Kit gene in EL cattle which has been identified previously in Belgian blue, white park and Galloway cattle and associated with color-sidedness.

DATA ACCESSIBILITY
The total signal intensity (LRR) and B Allele frequency (BAF) data of majority of the samples used in the present study are available from the Dryad Digital Repository: 10.5061/dryad.s57d6. The LRR and BAF data of HF samples used in the study are available to interested researcher upon the request.

ACKNOWLEDGMENTS
The authors would like to thank Bert Dibbits, Wageningen University and Research, Animal Breeding and Genomics, for DNA extraction and qPCR validation.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene.2017. 00108/full#supplementary-material TABLE S1 | Sampling information: the first column is breed code, the second column is breed name, the third and fourth columns displays information about country and region of origin for the breed, respectively, and the fifth column displays number of samples collected per breed. ALP, Alpine; BRI, British and Irish; NLD, Dutch; JE, Jersey; IBR, Iberian; BAI,Balkan and Italy.