Copy Number Variation Mapping and Genomic Variation of Autochthonous and Commercial Turkey Populations

This study aims at investigating genomic diversity of several turkey populations using Copy Number Variants (CNVs). A total of 115 individuals from six Italian breeds (Colle Euganei, Bronzato Comune Italiano, Parma e Piacenza, Brianzolo, Nero d’Italia, and Ermellinato di Rovigo), seven Narragansett, 38 commercial hybrids, and 30 Mexican turkeys, were genotyped with the Affymetrix 600K single nucleotide polymorphism (SNP) turkey array. The CNV calling was performed with the Hidden Markov Model of PennCNV software and with the Copy Number Analysis Module of SVS 8.4 by Golden Helix®. CNV were summarized into CNV regions (CNVRs) at population level using BEDTools. Variability among populations has been addressed by hierarchical clustering (pvclust R package) and by principal component analysis (PCA). A total of 2,987 CNVs were identified covering 4.65% of the autosomes of the Turkey_5.0/melGal5 assembly. The CNVRs identified in at least two individuals were 362—189 gains, 116 losses, and 57 complexes. Among these regions the 51% contain annotated genes. This study is the first CNV mapping of turkey population using 600K chip. CNVs clustered the individuals according to population and their geographical origin. CNVs are known to be indicators also of adaptation, as some researches in different species are suggesting.


INTRODUCTION
The domestication of the wild turkey appears to occur in Mexico between 200 B.C. and 700 A.D. (Crawford, 1992). The domesticated turkey has been introduced in Europe from Mexico and Central America starting in late 15 th century (Schorger, 1996) by the Spanish conquerors. The diffusion of the turkey population in the European territory was very fast, close to 50 km per year as indicated by Crawford (1992). The rapid diffusion in Europe was possibly facilitated because of their farming, as turkey was appreciated for its meat (Schorger, 1996). Then, since the 15 th century, the populations of European and Mexican turkey evolved independently for more than 500 years.
At present in Europe there is a clear differentiation in several turkey breeds, indicating that farmers and breeders have selected the turkey populations according to a directional goal for more than 5 centuries.
Additionally, in the last 40 years, companies developed a structured breeding plan to produce commercial hybrids selected to maximize meat production 1 .
The selection operated by farmers in the past 5 centuries in the Italian populations determined the appearance of strong variation in plumage colors, in body size and weight, differentiating the populations in breeds (Cavalchini, 1983). This differentiation was possibly also facilitated by the geopolitical structure of Italy in middle ages, structured in a large number of small states with very limited exchange of goods and populations, making each turkey population genetically isolated from the others. Plumage of these breeds spans from totally black (Nero d'Italia) to white with black streaks (Ermellinato di Rovigo), while it is generally bronze like or with bronze reflection in all the other Italian populations. Body size is also showing a considerable difference among the Italian breeds with male weight spanning from 4.5 to 6.5 kg in the Brianzolo and reaching 12 kg in the Ermellinato di Rovigo (Table 1). Due to the fact that local farming occurred for centuries, it is expected that genetic bottleneck occurred in the Italian populations. The Mexican turkey population has historically been farmed as a backyard population without any directional selection for centuries, with a plumage very variable in its color and a weight close to 6 kg in males. In fact, in this population, there is not any structured selection program, while its genetic peculiarity is a strong argument in favor of its conservation (Utrera et al., 2016). In the farming system birds are free to migrate, facilitating the exchange of genetics across the country, favoring the genetic variability occurring in the population thus contributing to its morphological homogeneity irrespectively from the geographical location. The Narragansett breed (NARR) originated in Rhode Island and was recognized as a breed at the end of the 19 th century. The NARR was originally developed in Rhode Island by colonies returning to America from Europe in 16 th century, bringing back turkeys of the Norfolk Black breed and crossing them with the native American ones (Ekarius, 2007).
In the last 40 years the intensive selection in turkey produced a fast-growing meat bird, a commercial hybrid (HYB). The selection for heavy turkey started presumably in North America and preferred the white pigmentation to other plumage colors (Christman and Hawes, 1999;Ekarius, 2007). Birds are selected according to a strong directional mating system to improve weight at slaughter and feed conversion efficiency. The hybrid population here used is a common commercial line of selected heavy turkey (white plumage) that reaches a weight of 20 kg or more in males.
Even though the directional selection occurring in European populations for more than 500 years determined that breeds differentiated in morphology and in performances, the European and central American populations share a common genetic background, because their common ancestral origin. This holds true also for commercial turkey line where, nevertheless, the intense directional selection performed in the last 40 years, affected dramatically the physiology, the adult weight, the growth rate, the behavior and the bird's sociality respect to the wild type 1 .
The Copy Number Variants (CNVs) are genomic structural variants recognized to have an active role in gene regulation (Redon et al., 2006;Gamazon and Stranger, 2015) and capable to identify genomic variation among populations. Their use in identifying genomic variation among populations is particularly relevant as several authors found a large proportion (up to 60% in chicken) of mapped CNVs Regions (CNVRs) harboring annotated genes related to expressed phenotypes caused by the specific evolution occurred in the populations Drobik-Czwarno et al., 2018;Strillacci et al., 2018).
The goal of this study is to produce the first CNV map in the Turkey species (Meleagris gallopavo) using high-density SNP chip information in several populations-the Mexican turkey, the Narragansett, six Italian breeds-and a commercial hybrid, and to produce a GO analysis of annotated genes in the mapped CNVRs. The strong directional selection occurring in high-producing hybrids, the one that occurred in the differentiation of the Narragansett and the Italian Turkey breeds, and the adaptive selection in the Mexican turkey population is then discussed according to the genes harbored in the CNVRs. The second goal of this study is to identify the existing variability among the breeds and populations using the mapped CNV, since knowledge of their genomic variation can be used to interpret the phenotypic variability.

MATeRIAls AND MeThODs sampling and sNP Chip Processing
A total of 115 biological samples from individuals belonging to six Italian breeds (Colle Euganei: CoEu -22; Bronzato Comune Italiano: BrCI -5; Parma e Piacenza: PrPc -15; Brianzolo: BR -32; Nero d'Italia: NI -31; Ermellinato di Rovigo: ErRo -10), 7 Narragansett turkeys, 38 commercial hybrids (HYB), 30 Mexican turkeys (MEX) were available from previous collections or deriving from other research projects and part of the University of Milan repository of animal samples. The University of Milan permit for the use of collected samples in existing bio-banks was released with n. OPBA-56-2016. The Mexican sample collection is part of the institutional Project "Identificación de los recursos genéticos pecuarios para su evaluación, conservación y utilización sustentable en México. Aves y cerdos. SIGI NUMBER 10551832012" coordinated with the activities of the Centro Nacional of Recursos Genéticos (CNRG) at Tepatitlán, Jalisco (México) 2 . Original owners of sampled individuals gave consent for re-use for research purposes. The study did not require any ethical approval according to national rules, according to EU regulation, as it does not foresee sampling from live animals.
The samples of the Italian breeds belong to individuals originally collected in different areas of North Italy (Veneto, Lombardia and Emilia Romagna), in nine small farms dedicated to the breeding of one or two breeds each. The MEX individuals were originally sampled across 12 different states of Mexico, characterized by various climatic and geographical environments. The individuals belong to backyard small groups, spread over many small farms. These birds, to the best of our knowledge, did not undergo any selection by the owners, who let them reproduce according to a naturally occurring random mating as they are raised as a backyard population. The Narragansett individuals were originally sampled from two family farms in North Italy. A brief description of each turkey population including a picture, the sampling geographical area, the plumage color, the adult body weight, and the fertility performance are reported in Table 1. The commercial hybrid comes from a unique farm in the Lombardia region in north Italy from the same batch of birds.
DNA extraction from feathers (Mexican samples) and blood (all others) samples were performed using ZR Genomic DNA™ Tissue MiniPrep (Zymo, Irvine, CA, USA) according to the procedures relative to different tissues. DNA was quantified using NanoQuant Infinite ® m200 (Tecan, Männedorf, Switzerland) and diluted to 50 ng/μl. Samples were processed on the Axiom ® Turkey Genotyping Array (Affimetrix) containing 634,067 SNPs. The Turkey_5.0 (GCA_000146605.1) genome assembly was used in this study as reference genome.
A quality control of raw intensity files using the standard protocol in the Affymetrix Power Tools package (www.affimetrix. com) was performed in order to guarantee high quality of obtained data. Default quality control settings, according to the manual (www.affimetrix.com), were applied to filter for lowquality samples, i.e. genotyping call rate <98% and Dish Quality Control <0.82.

CNVs Detection and subsequent Analysis
The Log R Ratio (LRR) and the B allele frequency (BAF) values were obtained using the Axiom ® CNV Summary Tool software. Outlier samples for LRR were identified using the SVS 8.4 software (Golden Helix Inc., Bozeman, MT, USA) through: i) the overall distribution of Derivative Log Ratio Spread (DLRS) values; and ii) screened according to GC content, which is correlated to a long-range waviness of LRR values by the wave detection factor algorithm as in Diskin et al. (2008). The CNV detection was performed on the data of birds passing quality controls on 30 autosomes, using two different calling algorithms: i) the Copy Number Analysis Module of SVS 3 , and ii) the Hidden Markov Model of PennCNV software 4 . In order to reduce the false-positive calls a consensus map of CNV obtained by the two algorithms was produced.
The CNV calling performed with SVS has been obtained using the univariate analysis based on LRR values, with the following options: univariate outlier removal, a limit of not more than 100 segments per 10,000 markers with a minimum of three markers per segment, and 2,000 permutations per pair with a P-value cut off of 0.005, according to the SVS 8.4 user manual.
The PennCNV calling (Wang et al., 2007) was based on LRR and BAF values using the default parameters: standard deviation of LRR <0.30, BAF drift as 0.01 and waviness factor at 0.05, and a minimum of three SNPs required to define a CNV. In addition, as to reduce the false calling rate function of the hmm parameter file proper of PennCNV, the CNV call was obtained using three different "hmm" files (agre.hmm, affygw6.hmm, hh550.hmm). The online PennCNV manual describes that the agre.hmm file produces an excess of false-positive calls respect to the default affygw6.hmm file (both specific for Affymetrix SNP array), which instead is known to produce a low number of CNV calls (i.e. excess of false negative) with respect to other calling software and algorithms. The hh550. hmm file (specifically developed for Illumina SNP arrays) has been considered in the calling process, because is based on a SNPs chip density closest to the one used in this study.
After the four CNVs detections (i.e. one for each hmm file and the one from SVS8.4), the outputs were compared, at individual level and within each population, using the -intersectBed command of Bedtools software (Quinlan and Hall, 2010). For each individual, the consensus_CNVs were defined as the length of the DNA tract full overlapping across at least two detections. CNVs were classified in loss (0 and 1 from the PennCNV output) and in gain (3 and 4 from PennCNV output) and were constant across the different callings.
CNV regions (CNVRs) at population level were obtained by merging consensus_CNVs that overlap by at least 1 bp using the -megeBed command of Bedtools (Quinlan and Hall 2010) in at least two birds. The identified CNVRs were classified as "breed_CNVRs" and "shared_CNVRs", when occurring in only one breed (i.e. BR, BrCl, CoEu, ErRo, NI and PrPc) or population (i.e. NARR MEX and HYB), or in at least two ones, respectively. CNVRs were classified within population in gain (all consensus_CNVs gain), loss (all consensus_CNV loss), and 3 http://goldenhelix.com 4 http://penncnv.openbioinformatics.org/en/latest/ complex (consensus_CNVs both gain and loss). Singleton CNVs were considered also to be singleton CNVRs.
Genes were annotated within the CNVRs using the NCBI Turkey_5.0 gene dataset (annotation Release 102), and the Bedtools "-intersectBed" command was used to catalogue these genes to the corresponding regions. Gene Ontology terms (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were performed using the DAVID Bioinformatic Database 5 . Only LOC genes catalogued in NCBI Database as protein genes were considered.
Different approaches were used to disclose population structure and diversification of all turkey population. In order to provide the required input for different analyses two different matrices were built using CNV data: i) the first matrix (matrix_1) was built by assigning a value of "1" (presence of CNV), or "0" (normal state) to each sample-CNV for each CNVR, without considering the CNV state; ii) the second matrix (matrix_2) was built assigning the sample-CNV genotypes: "0" homozygous deletion, "1" heterozygous deletion, "2" normal state (absence of CNV in that region), "3" heterozygous duplication and "4" homozygous duplication. For details see Strillacci et al. (2018).
The Past software (Hammer et al., 2001) was employed to perform and visualize two principal component analyses (PCAs), the first based on the matrix_1 as input data, and the second based on matrix_2. In addition, two 3D PCAs were performed with the "rgl" package of R 6 on PCAs results. The pvclust R package was utilized using the same matrixes to carry out two Hierarchical Clustering Analyses (HCA) applying 10,000 bootstraps (Suzuki and Shimodaira, 2006).
The STRUCTURE Software v.2.3.4 (Pritchard et al., 2000;Falush et al., 2003) was used to represent the population structure of the populations studied, on the basis of matrix_1. We used the STRUCTURE admixture model without the LocPrior option and setting 5,000 as burning period and 10,000 as iterations, performing five replicates for each K value from 2 to 20 and assuming nine different populations. Structure Harvester software (Dent and vonHoldt, 2012) was used to obtain the best K values, on the basis of STRUCTURE results, providing the DeltaK values according to the heuristic method reported by Evanno et al. (2005). The STRUCTURE PLOT software (Ramasamy et al., 2014) was employed to graphically visualize each cluster assignment of the K obtained.

CNVs and CNVRs Maps
A total of 13 samples (5 NI, 2 PrPc, 4 MEX, 1 ErRo and 1 BR) were excluded during quality assurance: three because of high DLRS values, seven because of wave factor values, and three for their exceptionally high number of called CNVs. Consequently, the final CNV dataset was used for genomic variation analyses comprised a total of 177 turkeys.
The total number of CNVs called was 2,987 (Supplementary Table 1) and varied in terms of number and size among the individuals of each population, as reported in Table 2. CNVs ranged from 819 bp to 453.5 kb in size with an average length of 115.2 kb, covering a total length of about 41 Mb (4.65%) of the turkey genome (chromosomes 1-30). The BrCl and the HYB showed shorter average CNVs with respect to other populations, while in the MEX one the longest average CNVs length was found. The HYB birds are also the most homogeneous for the average length of CNVs (Figure 1). The MEX breed is the one with the largest number of CNVs per individual (i.e. 28) while the HYB is the one with the lowest (i.e. 10).
Duplications were higher than deletions in the majority of populations except for BrCI, CoEu, and NARR breeds, where the ratio gain/loss (losses are the sum of the total copy numbers 0 and 1; gains are the sum of the total copy numbers 3 and 4) are inverted as shown in Figure 1. The gain/loss ratio is similar in HYB, MEX, NI, and PrPc populations (about 65% vs. 35%), but the proportion of duplication and deletion are differently represented in the other populations. The CNVRs including at least 2 individuals were 362 counting 189 gains, 116 losses, and 57 complexes and their distribution on the chromosomes is shown in Figure 1.
Statistics of CNVRs for each population are reported in  Figure 1 shows the distribution of singleton among breeds/populations and the distribution of loss and gain across all populations and by breed/population. The largest proportion of CNVRs resulted in gain, i.e. 77% across all breeds/ populations, with a proportion of singletons of 64%. This result is consistent with the proportion of singleton identified in chickens by others (Yi et al., 2014;Gorla et al., 2017).
The Venn diagram (Heberle et al., 2015) shown in Figure 2 represents the amount of CNVRs shared among the populations, grouping them as ITA (all Italian breeds), NARR (the Narragansett), MEX (the Mexican turkey population), and HYB (the commercial cross). The reason of this grouping resides in the type of evolution of the populations: the Italian breeds are all highly selected for breed standard phenotypes and possibly highly inbred; the Mexican population has been under an outbreeding mating system, with no directional selection undertaken for centuries; the NARR is a cross between the wild American turkey and the US domestic Bronze turkey; the HYB is a commercial population obtained by a strong directionally selection for heavy body weight. Three CNVRs resulted as common to all populations and a large proportion of ITA CNVRs are shared with MEX and HYB-65 and 42 CNVRs, respectively.
In Table 4 the details of the 32 CNVRs detected in at least 10 samples and the genes lying in the same regions are reported. Among those, the three regions in common to all turkey populations, as shown also in Figure 2, are located on chr3 at 92,889,953-92,936,492 (CNVR_1126, gain), on chr4 at 26,993-164,704 (CNVR_1240, gain), and on chr4 at 68,446,449-68,522,752 (CNVR_1371, complex). In the CNVR_1371, the one also found in the largest number of individuals from all breeds, the CD8A gene that is related to immune and inflammatory response is annotated (Li et al., 1999). In the other two common regions, CNVR_1126 and the CNVR_1240, the FK1L and the TLR2A gene are annotated, respectively, and involved in immune and inflammatory response and in feather keratin multigene family with implication in feather evolution (Li et al., 2013;Velová et al., 2018).
The other two regions are shared by a large number of individuals of ITA breeds and have been detected on chr4 at 63,830,569-63,854,111 in CNVR_1357 (62 birds from ITA breeds) and CNVR_1358 (65 birds from ITA breeds). These two regions are both a loss, are very close on the genome being 13,382 bp apart, and have been detected in almost the same samples of the same ITA breeds. No genes are annotated within these two CNVRs. Ten CNVRs in Table 4 are common to ITA and MEX, five common to ITA and HYB, and only one in common between ITA and NARR. Among these regions, nine of them include genes (CNVR_163, CNVR_1243, CNVR_1246, CNVR_1598, CNVR_488, CNVR_644, CNVR_987, CNVR_1025). There are no regions shared only among HYB, NARR, and MEX.
The Venn diagram in Figure 2 shows in detail the distribution of CNVRs among the six Italian breeds. It is worthy to mention that the gene CD8A is in a CNVR common to all the Italian breeds (in the red circle). Among the 362 CNVRs a total of 140 were mapped only in one specific population, the breed_CNVRs, as reported in Supplementary Table 4. The mapped genes in any species and the corresponding references for each association studies, the associated phenotypes and the organism involved, are also indicated.
The largest number of breed_CNVRs occurred in the MEX turkey population with 45 regions followed by the NI with 33. The lowest number of breed_CNVRs was found in the BrCI and in the NARR with one and four breed_CNVRs respectively. The number of genes annotated in the breed_CNVRs was 26 and 21 in MEX and NI, while the number of genes in breed_CNVR in  other populations was between one and eight. The gene IMMPL2 is harbored by two breed_CNVRs, one in the BI (CNVR_69) and one in the NI (CNVR_70). The two regions are very close even if they do not overlap. The results of the GO TERM and KEGG pathway analyses obtained using DAVID considering the genes found in the 362 shared_CNVRs are reported in the Supplementary Table 5 into clustered and not clustered groups of genes.
Supplementary Table 6 contains the information generated from the KEGG and GO Term analyses using DAVID from breed_CNVRs. The information was obtained using Meleagris gallopavo as background species and integrated and confirmed using the Gallus gallus as background, in case of absence of complete information for the Meleagris gallopavo species.

Genetic Variability Across Turkey Populations
Two clustering analysis were performed based on two different matrices (matrix_1 and matrix_2) described herein before. Both the cluster dendrograms, Figure 3 based on matrix_1 and Figure 4 based on matrix_2, showed distinct clades grouping animals belonging to the same populations. It is interesting to note that MEX and NARR always clustered very close. Also, Italian breeds and the Hybrid group form well distinct clusters according to their origin.  In all the PCAs graphs in Figures 3B, C and 4B, C the clustering results show two main clades: NARR, MEX, and HYB were grouping closer, while the ITA breeds clustered in a separate one.
The STRUCTURE software was employed to infer population structure and gene flow of the individuals of the nine populations studied. We calculate a number of K from K = 2 to K = 20 to identify the true number of possible clusters (subpopulation) in which it is possible to divide the populations. The estimated likelihood [LnP (D)] values were used to find the ΔK to distinguish the break in slope of the distribution of LnP (D) values at the true K. The analyses identify K = 13 as the best likely K value, suggesting that the population could be divided into 13 genetic groups.
Even though K = 5 shows the second higher value (Supplementary Figure 2) it is not possible to well differentiate the populations as in K = 13. In fact, for K = 2 to K = 12 it is not possible to assign each population to a clear distinct cluster, while for K = 14 to K = 20 the high level of admixture in each of the population results in nonsignificant successive clustering.

DIsCUssIONs
The results from this study are likely reflecting the human action on turkey populations, i.e., its migration to Europe and then back to America, and the directional selection occurring in the last 40 years to produce a fast-growing heavy bird.
The study considers three main groups of birds that reproduce and adapt according to different constraints and environmental conditions. The MEX population developed in a natural environment, with no (or very little) intervention by humans in mating and with no (or very little) supplement of feed and harsh rearing conditions. The Italian populations are the result of a phenotypic selection operated by individual farmers in their small group of individuals and operated to obtain birds that best perform in the semi-extensive farming system (backyard with recovery availability and feeding supplement) that characterized the middle ages poultry system of Italy and Europe. The HYB population, in the last 40 years, has been heavily directionally selected, through very well-structured genetic improvement and breeding plans to improve weight and growing performances and to best perform in an artificially controlled environment with unlimited feed supplement.
Our study is the first CNV mapping in a worldwide turkey sampling, from populations collected across different continents, and disclosed similarities and variation in CNVs and CNVRs across the populations studied. Because of the diversity in their breeding history and actual farming environmental conditions the MEX, ITA, and HYB populations provide an interesting model to investigate CNV variation, and their relation to gene expression and rearing conditions. The CNVs, in fact, are widely recognized to be a non-neutral genomic structural variation related to positive and directional selection. The CNV has been recently successfully used in poultry to differentiate breeds and populations with different genetic background Strillacci et al., 2017;Sohrabi et al, 2018), as well as in other species (Xu et al., 2016;Strillacci et al., 2018). Interestingly in chicken Sohrabi et al. (2018) discuss long-term adaptation of animals to rural and hard rearing conditions in relation to a specific expressed trait linked to a CNV identified in the Creeper indigenous chicken local population that is adapted to the harsh environmental condition of southeastern Iran. Additionally, a recent study on a eukaryotic model (Hull et al., 2017) showed that environmental changes are accelerating adaptation through the stimulation of copy number variation and that this is not a random effect but has a cause-effect relationship. Perry et al. (2007) also demonstrated that directional selection due to starch diet (i.e. environmental factor) is increasing specific copies of the genes involved in starch metabolism producing as such CNV gains. The CNV difference among populations here is shown in particular by the variation in the number of CNV per bird that is the lowest in the HYB (10 on average) and the largest in the MEX (28 CNV) and by the CNV length that in the HYB is much less variable than in the other two group (ITA and MEX) of birds. These findings support the hypothesis that the variability in CNV (size and number), as in the MEX vs. the HYB, is possibly related to the different breeding and selection underwent in these populations and to the environmental conditions where they were farmed: MEX very harsh rearing, HYB controlled artificial environment and ad libitum feeding. The same holds true for the ITA vs. MEX and HYB. Most of the genes found do not show previous associations with any specific function or pathway in turkey, since association studies in turkey are only few, but most of those genes have been previously studied and linked to functions in other species such as chicken, pig, bovine, birds, mice, zebrafish and human, as reported in Supplementary Table 4.
Thirty-two regions were detected in at least 10 individuals, and 14 of them include 29 genes that are known to be involved in different traits in different species (Table 4), such as immune response (TLR2A and CD8A), feather evolution (FK1L), feed efficiency (PRKG1 and LMAN1), growth traits (TCF15, FAM110A), and residual feed intake (TACC1,  Table 4. There are three CNVRs in common among all the populations; one of them harbors the CD8A gene, which is known to have a role in the host immune and inflammatory response in chicken (Li et al., 1999). The polymorphism of the CD8A gene has been studied in five lines of turkey populations by Li et al. (1999) who found a loss of this gene in one half of the turkey of a studied line. This loss can be related to the CNVR_1371 found in this study where 34 CNVRs were loss and 34, gain. All the ErRo resulted to have a loss-CoEu had 12 loss (over 13 birds), BrCI 4 loss (over five birds), while other populations have a more balanced representation between loss and gain CNVs.
The TLR2A gene has been shown to be involved in the bird's evolution with a strong driving of TLR due to positive selection (Velová et al., 2018). It is interesting to note that our results show that CNVR_1240 include the TLR2A gene with only normal and gain state. Even if the question of the adaptive value of the TLR genetic variation is still unresolved, the results found here are supporting the hypothesis that positive selection is driving the evolution of the gene towards duplication of copies as proposed recently by Velová et al. (2018).
Other genes in the CNVRs found here (Supplementary Table 4) are associated with immunity and inflammatory response in mice (TCF7, ARHGEF5), chicken (VMO1, GUCY1A2, NBN), bovine (NEK11), and in all species (PARP15) as reported in previous studies (Wang et al., 2009;Daugherty et al., 2014;Strillacci et al., 2014;Jang et al., 2015;Lim and Song 2015;Zhu et al., 2015;Saelao et al., 2018;Velová et al., 2018). Among the genes reported in the Supplementary  Table 4, the IMMP2L gene lies in CNVR_69 which is common to NI and BR. This gene was associated with fertility in mice (Bharadwaj et al., 2014) and with collective behavior in zebrafish (Tang et al., 2018). The presence of this gene in a gain CNVR may have some link with the typical collective behavior of the turkey.

CONClUsIONs
This study represents the first CNV mapping using high-density SNP chip on turkey. It provides first insights into the genomic architecture of the turkey population, laying the groundwork for future structural variation investigation in turkey species. In this study we have focused on the CNV, a structural variation linked to phenotypic expression regulation, in order to identify similarities across populations of the structural genome covered by this large variation. The turkey populations are a unique resource to identify evolutionary processes affecting the structural genome since it is possible to access to populations under positive selection only and, on the other extreme, under heavy artificial selection. The most complete isolation of the MEX turkey population and the European ones together to the HYB provide a unique model to disclose the effect of the adaptation to environment and directional artificial selection performed by humans on the structural genome.

DATA AVAIlABIlITY sTATeMeNT
The raw data supporting the conclusions of this manuscript are included in the Supplementary Materials.

eThICs sTATeMeNT
The animal study was reviewed and approved by Università degli Studi di Milano -OPBA-56-216.