Genome-Wide Variation, Candidate Regions and Genes Associated With Fat Deposition and Tail Morphology in Ethiopian Indigenous Sheep

Variations in body weight and in the distribution of body fat are associated with feed availability, thermoregulation, and energy reserve. Ethiopia is characterized by distinct agro-ecological and human ethnic farmer diversity of ancient origin, which have impacted on the variation of its indigenous livestock. Here, we investigate autosomal genome-wide profiles of 11 Ethiopian indigenous sheep populations using the Illumina Ovine 50 K SNP BeadChip assay. Sheep from the Caribbean, Europe, Middle East, China, and western, northern and southern Africa were included to address globally, the genetic variation and history of Ethiopian populations. Population relationship and structure analysis separated Ethiopian indigenous fat-tail sheep from their North African and Middle Eastern counterparts. It indicates two main genetic backgrounds and supports two distinct genetic histories for African fat-tail sheep. Within Ethiopian sheep, our results show that the short fat-tail sheep do not represent a monophyletic group. Four genetic backgrounds are present in Ethiopian indigenous sheep but at different proportions among the fat-rump and the long fat-tail sheep from western and southern Ethiopia. The Ethiopian fat-rump sheep share a genetic background with Sudanese thin-tail sheep. Genome-wide selection signature analysis identified eight putative candidate regions spanning genes influencing growth traits and fat deposition (NPR2, HINT2, SPAG8, INSR), development of limbs and skeleton, and tail formation (ALX4, HOXB13, BMP4), embryonic development of tendons, bones and cartilages (EYA2, SULF2), regulation of body temperature (TRPM8), body weight and height variation (DIS3L2), control of lipogenesis and intracellular transport of long-chain fatty acids (FABP3), the occurrence and morphology of horns (RXFP2), and response to heat stress (DNAJC18). Our findings suggest that Ethiopian fat-tail sheep represent a uniquely admixed but distinct genepool that presents an important resource for understanding the genetic control of skeletal growth, fat metabolism and associated physiological processes.

Variations in body weight and in the distribution of body fat are associated with feed availability, thermoregulation, and energy reserve. Ethiopia is characterized by distinct agro-ecological and human ethnic farmer diversity of ancient origin, which have impacted on the variation of its indigenous livestock. Here, we investigate autosomal genome-wide profiles of 11 Ethiopian indigenous sheep populations using the Illumina Ovine 50 K SNP BeadChip assay. Sheep from the Caribbean, Europe, Middle East, China, and western, northern and southern Africa were included to address globally, the genetic variation and history of Ethiopian populations. Population relationship and structure analysis separated Ethiopian indigenous fat-tail sheep from their North African and Middle Eastern counterparts. It indicates two main genetic backgrounds and supports two distinct genetic histories for African fat-tail sheep. Within Ethiopian sheep, our results show that the short fat-tail sheep do not represent a monophyletic group. Four genetic backgrounds are present in Ethiopian indigenous sheep but at different proportions among the fat-rump and the long fat-tail sheep from western and southern Ethiopia. The Ethiopian fat-rump sheep share a genetic background with Sudanese thin-tail sheep. Genome-wide selection signature analysis identified eight putative candidate regions spanning genes influencing growth traits and fat deposition (NPR2, HINT2, SPAG8, INSR), development of limbs and skeleton, and tail formation (ALX4, HOXB13, BMP4), embryonic development of tendons, bones and cartilages (EYA2, SULF2), regulation of body temperature (TRPM8), body weight and height variation (DIS3L2), control of lipogenesis and intracellular transport of long-chain fatty acids (FABP3), the occurrence INTRODUCTION African indigenous sheep originated in the Near East. They arrived, in the first instance, in North Africa via the Isthmus of Suez by the seventh millennium before present (BP) (Marshall, 2000). These sheep were of thin-tail type and their dispersion southwards into East Africa followed possibly the Nile river valley and the Red Sea coastline (Blench and MacDonald, 2006;Gifford-Gonzalez and Hanotte, 2011). The second wave brought fat-tail sheep into North and Northeast Africa via two entry points, the Isthmus of Suez and the Horn of Africa across the straits of Bab-el-Mandeb, respectively. The fat-rump sheep are a recent introduction and represent the third wave of arrival and dispersal of the species into eastern Africa (Epstein, 1971;Ryder, 1983;Marshall, 2000).
Sheep fulfill important socio-cultural and economic roles in the Horn of Africa. In Ethiopia they provide a wide range of products, including meat, milk, skin, hair, and manure, and are a form of savings and investment (Assefa et al., 2015). Ethiopia hosts many indigenous breeds of sheep, with currently 14 recognized populations/breeds, which are defined based on their geographic location and/or the ethnic communities managing them (Gizaw, 2008). Based on structure analysis, Edea et al. (2017) showed that the five Ethiopian indigenous sheep populations they analyzed clustered together based on their geographic distribution and tail phenotypes.
Fat depots act as an energy reserve that allows sheep to survive extreme environments and conditions such as prolonged droughts, cold, and food scarcity (Atti et al., 2004;Nejati-Javaremi et al., 2007;Moradi et al., 2012). Based on the combination of tail type and length, Ethiopian indigenous sheep can be classified as short fat-tail, long fat-tail, thintail, and fat-rump sheep. The short fat-tail inhabit sub-alpine mountainous regions, the long fat-tail predominate in mid-to high-altitude environments and the fat-rump sheep occur in semi-arid and arid environments (Gizaw et al., 2007). These populations are considered to be adapted to their production environments and they represent an important model species for investigating and enhancing our knowledge on the genome profiles of environmental adaptation, tail morphology, and fat localization.
Different approaches, that contrast groups of fat-and thintail sheep, have been used to identify candidate regions and genes associated with tail formation and morphotypes. Moradi et al. (2012) identified three regions on chromosomes 5, 7 and X associated with tail fat deposition in Iranian breeds. Using two fat-tail (Laticauda and Cyprus fat-tail) and 13 Italian thintail breeds, Moioli et al. (2015) identified BMP2 and VRTN as the likely candidate genes explaining fat-tail phenotype in the studied populations/breeds. Zhu et al. (2016) detected several copy number variations intersecting genes (PPARA, RXRA, and KLF11) associated with fat deposition in three Chinese native sheep (Large-tail Han, Altay, and Tibetan). Several candidate genes with possible links to fat-tail development, i.e., HOXA11, BMP2, PPP1CC, SP3, SP9, WDR92, PROKR1, and ETAA1, were identified using genome scans that contrasted fat-and thin-tail Chinese sheep (Yuan et al., 2017). Whole genome sequencing of extremely short-tail Chinese sheep revealed the T gene as the best possible candidate, among other nine genes, influencing tail size, following its association with vertebral development (Zhi et al., 2018). There is, so far, no information on the genetic basis of variation in tail fat distribution and size in African indigenous sheep.
In this study, using the Ovine 50 K SNP BeadChip genotypes, we investigated the (i) genetic relationships and structure within and between Ethiopian indigenous sheep of different fat-tail morphotypes alongside other sheep populations and breeds from the Caribbean, European, Middle East, China and Africa, and (ii) candidate genome regions and genes associated with tail morphology, fat deposition and possible eco-climatic adaptation in African indigenous sheep. For the latter, 11 Ethiopian indigenous sheep of different fat-tail morphotypes and two populations of thin-tail sheep from Sudan were analyzed.

DNA Samples and SNP Genotyping
The sampling strategy targeted breeds of indigenous sheep from different geographic regions in Ethiopia (Table 1 and Figure 1). Geographic positioning system (GPS) coordinates were recorded for all the populations. We used altitude to determine the agro-eco-climatic zones of the geographic locations where the sheep were sampled. All efforts were made to include populations representing the different tail phenotypes found in Ethiopia. Twenty DNA samples from two thin-tail sheep, Hammari and Kabashi, were obtained from Sudan. Genomic DNA was extracted from 146 ear tissue samples, collected from 11 Ethiopian indigenous sheep populations, using the NucleoSpin R Tissue Kit (www.mn-net.com) following the manufacturers protocol. All 166 genomic DNA samples were genotyped using the Ovine 50 K SNP BeadChip assay. The assay includes 54,240 SNPs composed of 52,413 autosomal, 1449 X-chromosome and 378 mitochondrial SNPs, respectively.
Ovine 50 K SNP BeadChip genotypes of Caribbean, European, Middle East and Chinese, as well as western, northern and southern African sheep, respectively were obtained from

Quality Control and Genetic Diversity Analyses
The Sheep HapMap dataset were merged with the ones generated from Ethiopian and Sudanese sheep using PLINK v1.9 (Purcell et al., 2007). The data files for final analysis were generated after pruning the merged dataset of SNPs not mapping on any autosomes, with a minor allele frequency (MAF) ≤0.01 and animals and markers with ≥10 and 5% missing genotypes, respectively. This generated a dataset with 45,102 SNPs which were further pruned, using PLINK v1.9, to be in approximate linkage equilibrium to avoid the possible influence of clusters of SNPs on population genetic relationship and structure analysis (Yuan et al., 2017). Following the latter pruning, 34,088 SNPs were retained for population relatedness and structure analysis. To minimize the possible loss of informative SNPs for selection signature analysis, the data for Ethiopian and Sudanese sheep was extracted from the dataset of 45,102 autosomal SNPs, that was obtained prior to LD pruning.
The proportion of polymorphic SNPs (Pn), expected (He), and observed (Ho) heterozygosity and inbreeding coefficient (F) were estimated for each population and across all populations using PLINK v1.9, to evaluate the levels of genetic diversity present in Ethiopian and Sudanese sheep, respectively.

Population Genetic Analyses
Principal component analysis (PCA) were performed using PLINK v1.9 to investigate the genetic structure and relationships among the studied breeds based on genetic correlations between individuals. A graphical display of the first two principal components (PC1 and PC2) was generated using GENESIS (Buchmann and Hazelhurst, 2014). Admixture analysis implemented in ADMIXTURE v1.3 (Alexander et al., 2009) was used to investigate underlying genetic structure and estimate the proportion of shared genome ancestry between the study populations. A 5-fold cross-validation procedure following Lawal et al. (2018), was used to determine the optimal number of ancestral genomes (K) and proportions of genome ancestry that was shared among the study populations.
To further evaluate historical relationships and interactions (gene flow) within and between Ethiopian and Sudanese populations, we used the maximum likelihood tree-based approach implemented in TreeMix (Pickrell and Pritchard, 2012) and included the Soay sheep as an out-group. The number of migration events (m) varied between 1 (migration between two populations and 15 (migration between all the populations). The value of "m" with the highest reproducibility and consistency, among the 15 tested, and which also had the highest loglikelihood value following six replication runs of the analysis, was chosen as the most optimal.
The f3 and f4 tests implemented in TreeMix were also performed. The f3-statistics (A, B, C) were to determine if A was derived from the admixture of populations B and C; a significantly negative value of the f3-statistics would suggest population A is admixed. The f4-statistics (A, B,) (C, D) were to test the validity of hierarchical clustering pattern in fourpopulation trees. Significant deviations of the f4-statistics from zero for the three possible topologies of the four-population trees would provide evidence of gene flow between the populations tested. A significantly positive Z-score indicates gene flow between populations that are related to either A and C or B and D while a significantly negative Z-score indicates gene flow between populations that are related to A and D or B and C. Standard errors were estimated using blocks of 500 SNPs.

Analysis of Signatures of Selection
For this analysis, we separated 12 of the 13 Ethiopian and Sudanese populations into four genetic groups based on the population clusters revealed by PCA. The four population groups included, western (Bonga, Kido, Gesses) and southern (Loya, ShubiGemo, Doyogena) long fat-tail, and fat-rump (Kefis, Adane, Arabo) sheep from Ethiopia and thin-tail sheep (Hammari, Kabashi) from Sudan. One short fat-tail sheep (Molale) was included with the fat-rump sheep and the other (Gafera), which appeared to be genetically distinct, was dropped from further analysis. Equal numbers of samples were chosen at random to represent each genetic group. Three comparisons which contrasted the fat-rump (E1), western (E2) and southern (E3) long fat-tail sheep with the thin-tail sheep (S) from Sudan were performed. The selection signature analysis involved three approaches, F ST , hapFLK and Rsb.
A sliding window approach was used to perform the F ST analysis using the HIERFSTAT package (Goudet, 2005) of R (R Core Team, 2012). The window size of 200 Kb, was allowed to slide along the genome by a distance of 60 Kb. The window size and slide distance were determined based on linkage disequilibrium (LD) decay analysis (Supplementary Figure 1). The pairwise F ST values (Weir and Cockerham, 1984) for each SNP in each window and between the genetic groups being tested were estimated as follows: Where p1, p2 and q1, q2 are the frequencies of alleles A and a in the first and second group of the test populations, respectively, and pr and qr are the frequencies of alleles A and a, respectively, across the tested groups (Zhi et al., 2018). The F ST values were standardized into Z-scores as follows: Where µF ST is the overall average value of F ST and σ F ST is the standard deviation derived from all the windows tested for a given comparison. Supplementary Figure 2 shows the distribution of the ZF ST values. We set the value of ZF ST ≥ 4 as the threshold to identify candidate genomic regions under selection.
The hapFLK approach was implemented with hapFLK package v1.2 (Fariello et al., 2013) to detect selection signatures based on differences in haplotype frequencies between groups of populations. Reynolds genetic distances were converted into kinship matrix using an R script supplied with the package. The hapFLK values and kinship matrix were calculated assuming 15 clusters in the fastPHASE model (-K 15). The hapFLK statistic was then computed as the average value across 40 expectation maximization (EM) runs to fit the LD model. The P-values were obtained by running a python script "Scaling_chi 2 _hapFLK.py" available at (https://forge-dga.jouy. inra.fr/documents/588) which fits a chi-squared distribution to the empirical distribution. As with the F ST calculations, the hapFLK statistics were also standardized using the formula: The calculation of the raw P-values was based on the null distribution of empirical values (Fariello et al., 2013;Kijas, 2014). The P-values were plotted in a histogram to assess their distribution pattern and the cut-off value to determine significance was set at -Log10 (P-value) ≥ 3 (Supplementary Figure 2). Using haplotype information, we performed the Rsb analysis implemented in rehh package (Gautier and Vitalis, 2012) of R. Haplotypes were estimated with SHAPEIT (Delaneau et al., 2014). To identify loci under selection, the Rsb values were log-transformed into P Rsb (P Rsb = -Log10 [1-2( (Rsb)−0, 5)]), where (x) represents the Gaussian cumulative distribution function (Gautier and Vitalis, 2012). Assuming that the Rsb values are normally distributed (under neutrality), the P Rsb can be interpreted as -Log10 (P-value), where P is the two-sided P-value associated with the neutral hypothesis. For each comparison, SNPs that exhibited P Rsb ≥ 3 (P-value = 0.001) were taken to be under selection (de Simoni Gouveia et al., 2017). The hapFLK and Rsb analysis were also performed using window sizes of 200 Kb sliding along the genome by a distance of 60 Kb.

Gene Annotation
Candidate regions that overlapped between F ST , hapFLK, and Rsb were identified and compared using the intersectBed function of Bed Tools software (Quinlan and Hall, 2010). Considering an average marker distance of between 60 and 200 Kb (Moioli et al., 2015) and the observed LD decay pattern (Supplementary Figure 1), candidate regions under selection were identified by exploring the SNPs found up-and downstream, and within, the most significant windows. The Oar v3.1 Ovine reference genome assembly (Jiang et al., 2014) was used to annotate the candidate regions. Functional enrichment analysis was performed using the functional annotation tool in DAVID (Huang et al., 2008) using Ovis aries as the background species. Gene functions were determined using the NCBI (http://www. ncbi.nlm.nih.gov/gene/) and OMIM databases (http://www.ncbi. nlm.nih.gov/omim/) and a review of literature.

Genetic Diversity and Population Structure
The average values of Pn, He, Ho, and F, as indicators of within-breed genetic diversity, are shown in Table 2   The PCA plot incorporating the global populations and which was constructed using a sample size of five animals that were selected at random per population, is shown in Figure 2. We used the uniform sample size of five animals since differences in sample sizes may influence clustering patterns on the PCA. The choice to use five samples per population was based on the smallest sample size of five individuals genotyped for Sidaoun and Berber breeds. In spite the sample size rebalancing, the population cluster patterns did not differ from that observed when the PCA was performed using unequal sample sizes (Supplementary Figures 4, 5). Generally, PC1 separates Ethiopian and South African fat-tail sheep, Sudanese thin-tail sheep, West African Djallonke and Algerian Sidaoun from the other sheep populations. Sheep from the Middle East and North Africa occur at the center of the PCA plot and, together with the Cyprus fat-tail and Chinese sheep (which cluster close together) are separated by PC2 from African Dorper, Barbados Blackbelly and European sheep. The two populations of Ethiopian short fat-tail sheep diverge from each other; Gafera-Washera clusters near Ethiopian long fat-tail sheep while Molale-Menz clusters together with the Ethiopian fat-rump sheep. The West African Djallonke clusters close to the two South African breeds (Ronderib and Namaqua). Sidaoun and Berber (both from Algeria) cluster separate, while the Cyprus fat-tail clusters close to the Chinese sheep (Figure 2).
To obtain a clearer picture of the variation within the fat-tail sheep, we performed the PCA excluding the thin-tail sheep (Figure 3). PC1 separates the Ethiopian fat-tails from their Middle East, North Africa, Mediterranean and Chinese counterparts. PC2 differentiates the South African breeds from the Ethiopian ones. Like the global PCA, one Ethiopian short fat-tail sheep (Gafera-Washera) clusters with the Ethiopian longfat tail sheep and the other (Molale-Menz) forms a cluster with the Ethiopian fat-rump sheep. Middle East sheep cluster together with the North African ones while the Mediterranean sheep unexpectedly cluster with the Chinese sheep despite the large geographic distance separating them.
To further illustrate the distribution of genetic variation among East African populations, we performed the PCA with only the Ethiopian and Sudanese thin-tail sheep (Figure 4). PC1 separates Ethiopian fat-rump, Molale-Menz (Ethiopian short-fat tail) and thin-tail sheep from the Ethiopian long fat-tail and Gafera-Washera (Ethiopian short-fat tail) sheep. Generally, PC1 separates the fat-rump sheep from the fat-tail ones derived from western and southern Ethiopia. PC2 reveals further separation of the Ethiopian sheep: (i) Molale-Menz, Adane and some Arabo individuals from Kefis, and the remaining Arabo individuals, and (ii) Gafera-Washera, Kido and Gesses from Doyogena, ShubiGemo, Bonga and Loya.
Admixture analysis on the global dataset, separates the study populations following their geographic origins ( Figure 5). The cross-validation error registered the lowest value at K = 9 suggesting this to be the most optimal number of clusters explaining the variation in this dataset (Supplementary Figure 6a). Chinese sheep separate from the other populations at K ≥ 3. Among African breeds, the South  African ones (Namaqua, Dorper, Ronderib) and the West African Djallonke show a distinct but common genetic ancestry with the Ethiopian and Sudanese sheep for 3 ≤ K ≤ 6.
Two to six hypothetical ancestral clusters (K) were tested with Admixture on the East African dataset. The lowest crossvalidation error suggests K = 4 (Supplementary Figure 6b) as the optimal number of ancestral clusters present in Ethiopian and Sudanese thin-tail sheep. The proportion of each ancestral cluster (referred to as A, B, C, and D) in each population at K = 4 is shown in Figure 6 and Supplementary Table 2     by the model covariance matrix (W) was used to identify the information contribution of each migration vector added to the tree. Up to 15 possible migration vertices were computed. The first eight migration edges (gene flow) accounted for more than half of the total model significance explained by the f statistic, with the first migration edge having a f value of 0.51. We therefore chose m = 8 as the best predictive value for the migration model. Vectors from 9 to 15 resulted in only small incremental changes in the f value (Figures 7A,B). The eight migration events were Loya and ShubiGemo (both long fat-tail); Arabo and Adane (both fat-rump); Gafera-Washera, Molale-Menz (both short fat-tail) and Adane (fat-rump); Molale-Menz (short fat-tail) and Adane (fat-rump) with ShubiGemo (long fattail); Bonga with ShubiGemo, Doyogena and Loya (all long fattail sheep); Molale-Menz (short fat-tail) and Arabo (fat-rump); ShubiGemo (long fat-tail) with Arabo (fat-rump) and Kefis (fat-rump); Gesses (long fat-tail) with Kabashi and Hammari (thin-tail).
The f4-statistics, also highlighted possibilities of gene flow among various breeds. The highest Z values (>|50|) were observed between Hammari and Kabashi (thin-tails) and Arabo and Kefis (fat-rump) (Supplementary Table 3). The f3-statistics however, did not reveal any likelihood of gene-flow between the breeds analyzed (Supplementary Table 4). This could be due to a complex pattern of gene-flow between the study populations, which may not be accounted for by a three-way model.

Signatures of Selection
The Admixture, TreeMix and PCA (Figures 6, 7;  Supplementary Figure 4) revealed three genetic groups in Ethiopian sheep viz fat-rump (E1), and long fat-tail from western (E2) and southern (E3) Ethiopia, respectively. The two short fat-tail sheep (Molale-Menz and Gafera-Washera) analyzed here were separated from each other (Figure 4) with Molale-Menz showing close genetic affinity to fat-rump sheep and Gafera-Washera appeared genetically distinct. The three groups are distinct from thin-tail (S) sheep (Figure 4). For selection signature analysis, we included Molale-Menz with the fat-rump sheep but excluded Gafera-Washera from the analysis due to its low sample size. We selected, at random, 20 samples to represent each of the four genetic groups and performed the selection signature analysis. We contrasted the three groups of Ethiopian sheep (E1, E2, and E3) with the thin-tail sheep (S). The top windows (Supplementary Table 5), which passed the significance threshold, for each method (hapFLK ≥ 3, ZF ST ≥ 4, Rsb ≥ 3) were used to define candidate regions under selection.
For E1 * S comparison, the fat-rump sheep were differentiated from the thin-tail in 23 candidate regions that overlapped between at least two selection signature methods and which spanned 86 genes (Figure 8, Table 3). Similarly, a total of 65 genes were present across 18 candidate regions that overlapped between at least two approaches in the E2 * S (western Ethiopia long fat-tail verses thin-tail) comparison (Figure 9, Table 4). Furthermore, 10 genes that seemed to be highly selected were identified by Rsb in three candidate regions on Oar8, Oar14, and Oar18, respectively (Figure 9, Table 4). Twelve overlapping candidate regions spanning 36 genes, were observed in the southern Ethiopian fat-tail verses thin-tail sheep (E3 * S) (Figure 10, Table 5). There were also 16 genes found across 1 (Oar26, 3 genes), 1 (Oar3, 1 gene), and 12 (Oar2, 1 gene; Oar3, 9 genes; Oar10, 2 genes) candidate regions that were identified by hapFLK, ZF ST , and Rsb, respectively (Figure 10, Table 5).
We performed gene ontology (GO) enrichment analysis for the candidate genes revealed in each pairwise comparison (Supplementary Table 6). The five topmost  GO terms associated with the candidate genes from the E1 * S comparison include

DISCUSSION
In this study, we used Ovine 50 K SNP BeadChip generated genotype data to investigate autosomal genetic diversity in Ethiopian indigenous sheep. Including populations from other regions of the world and the African continent allowed us to assess this diversity in a global geographic context. Our findings showed that the Ethiopian indigenous sheep are genetically differentiated from the other populations including other African fat-tail sheep (Figures 2, 3). The finding that the Ethiopian fattail sheep are distinct from those found in North Africa, support the presence of at least two genetic groups of fat-tail sheep in the continent and two separate introduction events, one via the Northeast Africa and the Mediterranean Sea coastline, and the other via the Horn of Africa crossing through the strait of Babel-Mandeb, respectively. The distinct clustering of the thin-tail sheep suggests its independent introduction into the continent. The fact that the South African Ronderib and Namaqua sheep occur on the same PC planar axis with Ethiopian sheep (Figure 2) may suggest, a common genetic heritage between the two rather than with the North African breeds. The movement of  sheep southwards remains speculative; some linguistic evidence suggests movement of bantu speaking populations from West Africa to South Africa through central Africa and following a western route rather than the more traditionally postulated eastern routes from East to South Africa (Newman, 1995). In such context a close clustering of the thin-tail West African sheep with some fat-tail southern African sheep breeds, such as the Namaqua from Namibia studied here is worth mentioning as it offers some possible insights. This however, will require further investigation beyond the scope of this study. Our results agree with previous findings that were arrived at using microsatellite loci (Muigai, 2003) and 50 K SNP genotype data (Mwacharo et al., 2017). They are also in line with archaeological and anthropological evidences indicating the introduction first, of thin-tail sheep into the continent followed by fat-tail sheep, initially through the Sinai Peninsula and later the Horn of Africa region (Gifford-Gonzalez and Hanotte, 2011;Muigai and Hanotte, 2013).
Interestingly, the PCA results involving Ethiopian and Sudanese sheep separate the Ethiopian populations into three groups while ADMIXTURE revealed four genetic clusters in Ethiopian sheep irrespective of their geographic origins in the country. TreeMix revealed extensive gene flow between populations of different geographic origins and tail-types. These results suggest, most likely, current and historical intermixing of sheep following human socio-cultural and economic interactions. This appears to be a common feature in Ethiopia and most likely the Northeast and eastern Africa region as it was also observed in Ethiopian goats by Tarekegn et al. (2018). We propose here that the common D genetic background present in short fat-tail and fat-rump sheep may represent historical introgression of the thin-tail gene pool into short fat-tail and fat-rump genepool. This result calls for further investigation.
Our findings on the genetic relationships and differentiation between Ethiopian sheep populations agree with findings of previous studies, which were performed using either microsatellites (Gizaw, 2008) or 50 K SNP genotype data (Edea et al., 2017) and which indicated a grouping of Ethiopian indigenous sheep populations based on their tail phenotypes. However, uniquely in our study, the long fattail populations were further subdivided into two secondary groups representing sheep populations from western and southern Ethiopia (Figure 4). These two groups were also defined by different genetic backgrounds by ADMIXTURE (Figure 6) and they clustered separately in TreeMix (Figure 7). In addition, although they are defined by the same tail phenotype, the two populations of Ethiopian short fattail sheep did not cluster together. Geographic isolation coupled, most likely, with adaptation to different ecoclimates, as well as ethnic, cultural and religious practices and differences, that can impede gene flow, may have shaped this genetic sub-structuring (Madrigal et al., 2001;Gizaw et al., 2007).
In selection signature analysis, we contrasted groups of Ethiopian indigenous sheep that showed variation in the size of the fat-tail with thin-tail sheep. Our results identified several genes as potential candidates controlling tail morphotype and fat localization in the study populations. Several genes occurred within candidate regions that overlapped between at least two of the three approaches used to detect signatures of selection (hapFLK, F ST , Rsb). The F ST approach detects signatures arising from an increase or decrease in allele frequency differentiation between populations/breeds, hapFLK detects the same but based on increase/decrease in haplotype frequency differentiation between populations while accounting for hierarchical population structure (Kijas, 2014) while Rsb detects signatures associated with the patterns of linkage disequilibrium between loci across the genome (Oleksyk et al., 2010;de Simoni Gouveia et al., 2014). Since these methods are based on different algorithms and assumptions, if common signatures are detected by at least two of the methods it suggests good reliability of the results while reducing the likelihood of interpreting false positives. They also detect signatures spanning different time periods; the F ST and hapFLK detect signatures arising from long term differential selection while Rsb detects ongoing signatures of selection including those that arise in the short to medium term (Oleksyk et al., 2010).
In the E1 * S comparison, three genes associated with growth traits were present on the candidate region on Oar2, i.e., histidine triad nucleotide binding protein 2 (HINT2), sperm associated antigen 8 (SPAG8) and natriuretic peptide receptor 2 (NPR2). Previous studies reported these genes to be associated with birth and carcass weights, and fat depth, respectively, in cattle (Casas et al., 2000;McClure et al., 2010) and sheep (Moradi et al., 2012;Wei et al., 2015). We also identified two genes on Oar5 (ANGPTL8, INSR), which might be responsible for fat accumulation in adipose tissues. Angiopoietin-like 8 (ANGPTL8), when induced by insulin receptor (INSR), inhibits lipolysis and controls post-prandial fat storage in white adipose tissue and directs fatty acids to adipose tissue for storage during the fed state (Mysore et al., 2017). The ADAMTS3 (ADAM metallopeptidase with thrombospondin type 1 motif 3) gene was present in the region identified on Oar6. This gene is expressed in cartilage, where collagen II is a major component, as well as in embryonic bone and tendon, suggesting that it could be a major procollagen processing enzyme in musculoskeletal tissues (Dubail and Apte, 2015). The homeobox B13 (HOXB13) and ALX homeobox 4 (ALX4) were identified on the candidate region on Oar11 and Oar15, respectively. Mutations in the former result in overgrowth of caudal spinal cord and tail vertebrae in mice (Economides et al., 2003), while the latter is involved in the development of limbs and skeleton (Fariello et al., 2014).
Our enrichment analysis for the E1 * S genes revealed a cluster of genes (BMP4, MED1) with functions that could possibly be related to tail formation. Bone Morphogenetic Protein 4 (BMP4) was revealed by Rsb and F ST to be on a candidate region on Oar7 and it has been implicated in tail formation (Moioli et al., 2015). Peroxisome Proliferator Activated Receptor Gamma (PPARG) expression has been associated with back-fat thickness in sheep (Dervish et al., 2011). Ge et al. (2008) reported Mediator Complex Subunit 1 (MED1) is an essential protein for the optimal functioning of PPARG. Despite this association, our analysis did not reveal any signals spanning PPARG, but two of our methods (Rsb and F ST ) revealed a signature on Oar20 that spanned the PPARD gene, a paralogue to PPARG.
In the same comparison (E1 * S), we identified a cluster of genes (CDH8, ADRB3, THRA, TRPM8, PLAC8) that are associated with the GO biological process, response to cold. This is not surprising considering that three out of the four E1 populations are living at a high altitude and therefore in a relatively cold habitat. Indeed, Adreno receptor Beta 3 (ADRB3) plays a major role in energy metabolism and regulation of lipolysis and homeostasis (Wu et al., 2012). It is also associated with birth weight, growth rate, carcass composition and survival in various breeds of sheep (Horrell et al., 2009). The ion channel TRPM8 has been reported to play a major role in eliciting cold defense thermoregulation, metabolic and defense immune responses in humans (Kozyreva and Voronova, 2015).
Several other genes occurring in the E1 * S candidate regions and which are associated with the GO term embryonic skeletal system development (GO:0048706) included HOXC6, SULF2, WNT11, and HOXB9. WNT11 was identified by ZF ST on Oar15 while HOXC6 and HOXB9 were revealed by hapFLK on Oar3 and Oar13, respectively. The WNT gene family and the T gene have been implicated in vertebral development in laboratory mice (Greco et al., 1996), and with the short-tail phenotype in sheep (Zhi et al., 2018). In addition, the roles of the WNT gene family in lipid metabolic processes in fat-tail sheep have also been reported (Kang et al., 2017). The HOX genes represent transcriptional regulatory proteins that control axial patterning in bilaterians (Garcia-Fernàndez, 2005), where the inactivation of one of the HOX genes often causes transformations in the identity of vertebral elements (Mallo et al., 2010). The HOX genes are able to control morphologies along the anteroposterior axis (Lewis, 1978). Furthermore, HOXC11, HOXC12, and HOXC13 developmental genes were found to be expressed in the tail region indicating their possible associations with tail size and fat development in fat-tail sheep (Kang et al., 2017).
The candidate regions revealed by the E2 * S comparison, spanned 65 candidate genes. Three genes of the BPI fold Containing Family B (BPIFB3, BPIFB4, and BPIFB6) were present in a candidate region on Oar13. These, along with other paralogs (BPIFB1, BPIFA3, BPIFB2, BPIFA1), formed a cluster of functional genes related to the GO term, lipid binding functional process (Supplementary Table 6). In contrast to the E1 * S comparison, the cluster of genes identified in the E2 * S comparison were associated with the GO terms, Magnesium ion binding, response to gamma radiation and cellular response to heat. This suggests, most likely, the propensity of this group of sheep to adapt to the eco-climatic conditions prevailing in their home-tract. This is consistent with the humid highland and moist lowland conditions of the geographic area where the populations representing the E2 group (Bonga, Gesses, Kido) were sampled. High fecundity and prolificacy is a common reproductive trait preferred by farmers in the Bonga sheep (field observations by the last author). This may explain the occurrence of the CIB4 and PRKAA1 in a candidate region in the E2 * S comparison. The CIB4 gene was suggested to be linked, in some way, to high fecundity in the small Tail Han sheep (Yu et al., 2010) and PRKAA1 is involved in ewe's follicular development (Foroughinia et al., 2017).
The third comparison (E3 * S) resulted in 36 genes that occurred in candidate regions that were revealed by at least two methods used to detect selection signatures. Fatty acid binding proteins FABP3 and FABP1 found on candidate regions on Oar2 and Oar3, respectively are the genes that relate most closely to fat deposition. SREBF1 along with PPARG are the main transcription factors controlling lipogenesis in adipose tissue and skeletal muscle (Ferré and Foufelle, 2010), and are mainly regulated by fatty acid-binding proteins (FABP) (Lapsys et al., 2000). Recently, Bahnamiri et al. (2018) evaluated the effects of negative and positive energy balances on the expression pattern of these genes in fat-tail and thin-tail lambs. They observed differential transcriptional regulation of lipogenesis and lipolysis during periods of negative and positive energy balances in the two groups of lambs. In general, the cluster of genes identified in this comparison was significantly enriched for GO terms relating to skin development, wound healing and regulation of actin cytoskeleton reorganization (Supplementary Table 6).
The overlapped genes between all comparisons are shown in Supplementary Figure 7. The commonest genes between the three comparisons are TSPAN8, RXFP2, and RIN2. The TSPAN8 (Tetraspanin 8) occurred in the candidate region on Oar3; it is among the genes that are reported to be associated with insulin release and sensitivity, and obesity in humans (Grarup et al., 2008), while the relaxin family peptide receptor 2 (RXFP2) has been associated with horn morphology (Johnston et al., 2011;Wiedemar and Drögemüller, 2015).
Twelve genes (MELK, RNF38, GNE, CLTA, CCIN, RECK, HINT2, SPAG8, NPR2, FAM221B, MSMP, RGP1) were common between E1 * S and E2 * S comparisons. On Oar2, three genes were identified within the overlapping candidate region, i.e., CLTA which is associated with prion protein deposition in sheep (Filali et al., 2014), GNE which is important for the metabolism of sialated oligosaccharides in bovine milk (Wickramasinghe et al., 2011) and RECK which encodes an inhibitor of angiogenesis, invasion and metastasis, DNA methylation, and increased mRNA in cell lines in humans (Su, 2012). Other genes (i.e., HINT2, SPAG8, and NPR2) are associated with fat deposition in sheep as herein discussed for each of the three comparisons.
Furthermore, one gene (DIS3L2) was in a candidate region that overlapped between the E1 * S and E3 * S comparisons. DIS3 like 3'-5' exoribonuclease 2 (DIS3L2) has also been identified, among genes involved in cancer, cellular function and maintenance, and neurological disease, in a candidate region under selection in cattle (Gautier et al., 2009). In sheep, using F ST , iHS, and Rsb, de Simoni Gouveia et al. (2017) indicated that DIS3L2 is among genes associated with height variation. In addition, DIS3L2 has reportedly been associated with the Perlman syndrome, which is characterized by overweight in humans (Astuti et al., 2012).

CONCLUSION
Overall, our results revealed four distinct autosomal genomic backgrounds (A, B, C, D) in Ethiopian indigenous sheep. The genotypes of most of the individuals analyzed were made up of at least two genetic backgrounds which could be accounted for by some level of current or historical admixture between populations. Selection signature analysis identified several putative candidate regions spanning genes relating to skeletal structure and morphology, fat deposition and possibly adaptation to environmental selection pressures. Our results indicate that Ethiopian indigenous sheep could be a valuable animal genetic resource that can be used to understand genetic mechanisms associated with body fat metabolism and distribution. This is especially important because fat deposits are a crucial component of adaptive physiology and excessive fat deposition in adipose tissue can result in obesity and overweight, and energy metabolism disorders in humans.

ETHICS STATEMENT
The animals used in this study are owned by farmers. Prior to sampling, the objectives of the study were explained to them in their local languages so that they could make an informed decision regarding giving consent to sample their animals. Government veterinary, animal welfare and health regulations were observed during sampling. The procedures involving sample collection followed the recommendation of directive 2010/63/EU. Skin tissues importation and/or exportation was permitted by the Ethiopian Ministry of Livestock and Fisheries under Certificate No: 14-160-401-16.

AUTHOR CONTRIBUTIONS
AbA, JM, and OH conceived and designed the study. AbA analyzed the data and together with JM wrote the manuscript. JM and OH revised the manuscript. HB provided support in data analysis. SM, FP, and EC contributed to genotyping and genotype data of non-Ethiopian breeds (Najdi, Omani, and Libyan Barbary) and provided critical inputs on data analysis and in writing the manuscript. FA, MA, and MOA supported the sampling and genotyping of Najdi, Omani and Libyan sheep. AK and AyA lead and coordinated the sampling of Ethiopian sheep HM lead and coordinate the sampling of Sudanese sheep. All authors contributed to the interpretation of the results based on their knowledge on local indigenous sheep genetic resources of their respective countries. All the authors read and approved the final manuscript.

ACKNOWLEDGMENTS
This study was conducted during AbA's PhD study which is sponsored by the Libyan Ministry of Higher Education and Scientific Research and the University of Misurata. Sampling of Ethiopian sheep was supported by the CGIAR Research Program on Livestock (Livestock CRP) and accordingly, ICARDA and ILRI wish to thank the donors supporting the Livestock CRP.