Genome Divergence and Dynamics in the Thin-Tailed Desert Sheep From Sudan

With climate change bound to affect food and feed production, emphasis will shift to resilient and adapted indigenous livestock to sustain animal production. However, indigenous livestock comprise several varieties, strains and ecotypes whose genomes are poorly characterized. Here, we investigated genomic variation in an African thin-tailed Desert Sheep sampled in Sudan, using 600K genotype data generated from 92 individuals representing five ecotypes. We included data from 18 fat-tailed and 45 thin-tailed sheep from China, to investigate shared ancestry and perform comparative genomic analysis. We observed a clear genomic differentiation between the African thin-tailed Desert Sheep and the Chinese thin-tailed and fat-tailed sheep, suggesting a broad genetic structure between the fat-tailed and thin-tailed sheep in general, and that at least two autosomal gene pools comprise the genome profile of the thin-tailed sheep. Further analysis detected two distinct genetic clusters in both the African thin-tailed Desert Sheep and the Chinese thin-tailed sheep, suggesting a fine-scale and complex genome architecture in thin-tailed sheep. Selection signature analysis suggested differences in adaptation, production, reproduction and morphology likely underly the fine-scale genetic structure in the African thin-tailed Desert Sheep. This may need to be considered in designing breeding programs and genome-wide association studies.


INTRODUCTION
A common research thread that links population and quantitative genomics is the elucidation of patterns and processes underlying population genetic structure. Whether such structure is stable in time and space is increasingly addressed for its utility in determining how many genetically distinct populations exist and their inter-relationships (Nielsen et al., 1999;Waples and Gaggiotti, 2006). Insights from such investigations inform management decisions that define conservation units and the design of genetic monitoring and breeding programs (Palsbøll et al., 2007;Schwartz et al., 2007). The tropics and sub-tropics are home to a large reservoir of indigenous livestock with a high degree of adaptive resilience, and which support agricultural and nonagricultural industries with minimal anthropogenic interventions (FAO, 2007). Indigenous livestock can therefore provide a foundation to sustain production under increasing challenges resulting from global warming and rising human demand for livestock products. It is also likely that future livestock production will come from marginal areas where arable agriculture is at high risk of failure and thus particular attention would have to be given to the uniqueness of genetic features, as it is difficult to predict the future importance of traits and alleles (Taberlet et al., 2008).
Domestic sheep (Ovis aries) are central to national economies as a source of cash, meat, milk, fiber, etc., and to traditional societies as repositories of socio-cultural values. Sheep are also essential components of diverse production systems due partly to their versatility to adapt to local biophysical and production environments. Domestic sheep comprise three broad types: thin-tailed, fat-tailed, and fatrumped sheep (Porter, 2020). Thin-tailed sheep are the most ancient and in the African continent, two types are recognized: the long-legged (Sahelian) and the tropical Dwarf (Djallonké) sheep. The Sahelian is confined to the hot arid marginal environments in eastern, western and northern Africa, while the Djallonke is well adapted to subhumid and humid tropics of western and central Africa. The analysis of mitochondrial genomes has shown that the Sahelian and Djallonke comprise separate maternal ancestries (Brahi et al., 2015).
The long-legged thin-tailed sheep found in Sudan represents the complexity that is typical of most indigenous livestock. They are subdivided into Desert, Nilotic, Arid upland, Arid equatorial, and West African populations, including their inter-crosses, following their eco-geographic distribution (Abualazayium, 2004). Within the Sudanese thin-tailed Desert Sheep, a longlegged Sahelian thin-tailed sheep, at least eight ecotypes, Hammari, Kabashi, Shanbali, Shugor, Dubasi, Watish, Al Ahamda, and Borouge, have been described (Abualazayium, 2004). Whether these ecotypes represent real underlying genetic variation remains unknown. If confirmed, they could offer a powerful genetic model to investigate drivers of divergence in indigenous livestock. Understanding fine-scale genetic structure is also important to control confounding effects of population stratification in association studies (Price et al., 2010). In this study, we applied distance-and model-based comparative genomic approaches to 600K single nucleotide polymorphism (SNP) genotype data from 121 individuals of five ecotypes of the Sudanese thin-tailed Desert Sheep, to investigate their genome architecture and dynamics. We analyzed the dataset alongside similar data from one fattailed and four thin-tailed breeds of sheep from China, to investigate their shared genome ancestry and for comparative genomic assessment.

Sample Collection and DNA Extraction
Whole blood was collected through jugular venipuncture from 121 animals representing five ecotypes of thin-tailed Desert Sheep in Sudan (Supplementary Tables 1, 2). Blood samples were also collected from 65 individuals of four breeds of Chinese sheep, including one fat-tailed breed (Tan Sheep) from a dry lowland environment in Ningxia Province and three thin-tailed breeds (Oula, Zeku, and Black Tibetan) from the alpine high-altitude in Qinghai Province (Supplementary Table 1); these were used for comparative genomic analysis. The DNeasy R Blood and Tissue Kit (Qiagen Inc., United States) and the phenol-chloroform method were used to extract DNA from the Sudanese and Chinese samples, respectively. The samples were genotyped with the Ovine Infinium 600K BeadChip at GeneSeek Inc. (Sudanese samples) and Compass Biotechnology (Chinese samples). Of the 606,006 SNPs present in the BeadChip, 577,401 are autosomal, 27,314 are on the X-chromosome and 1,291 remain unmapped.

Data Screening and Quality Control
The genotypes from both sets of samples were merged and the data were screened for quality with PLINK (Chang et al., 2015). Samples with more than 10% missing genotypes, SNPs with less than 90% genotype call rates, Hardy-Weinberg equilibrium (HWE) threshold of 1e-10 and minor allele frequency (MAF) < 0.01 were discarded. Using the "genomemin 0.05 -max 1" flag, the Pi-HAT statistic was calculated to assess the level of genetic relatedness between individuals and determine outliers with the objective of excluding the outliers and at least one amongst a pair of individuals that showed close genetic relationship. The value of 0.1875 which represents the half-way point between the 2nd and 3rd degree relatives was used as the cut-off threshold. These filtering thresholds left 155 samples (Table 1) and 519,711 SNPs which were used for selection signature analyses. Using PLINK, the 519,711 SNPs were subjected to linkage disequilibrium (LD) pruning using the parameters 50, 5, and 0.5 representing window size (kb), step size (kb), and r 2 threshold, respectively, resulting in 390,243 SNPs that were used for population structure and phylogenetic analysis.

Genetic Diversity, Relationship, and Structure
The "het" and "ibc" options in PLINK were used to calculate the observed (H O ) and expected (H E ) heterozygosity, inbreeding coefficient F and Pi-HAT statistics. The "detectRUNS" package (Biscarini et al., 2019) was used to scan the genomes for runs of homozygosity (ROH) using the consecutive method (Marras et al., 2015). The parameters used to detect ROH were: (i) minimum number of SNPs in a sliding window = 50, (ii) minimum ROH length = 1 Mb, (iii) minimum number of consecutive SNPs for each ROH = 50, (iv) number of heterozygous SNPs per window = 1, (v) missing SNP calls per window = 5, (vi) minimum SNP density = 1 SNP/100 kb, and (vii) maximum gap between consecutive SNPs = 1 Mb. The ROH coefficient depicting genome-wide inbreeding (F ROH ) was computed as the ratio of the total length of ROH to the length of autosomes (2.45 Gb) (McQuillan et al., 2008).
To explore demographic dynamics in the African thin-tailed Desert Sheep and in the thin-tailed and fat-tailed sheep from China, the trends in LD over genomic distances were examined by calculating the correlation coefficient (r 2 ) between alleles at two SNP loci using the "indep" option in PLINK. Following Sved (1971), the effective population size (N E ) was estimated with the equation N Et = (1/4c) (1/r 2 -1), where N Et represents the effective population size t generations ago, t = 1/2c, r 2 is the LD between pairwise SNPs and c is the genetic distance in Morgan between a pair of SNPs (Tortereau et al., 2012). Weir and Cockerham (1984) F ST statistic was calculated to determine the extent of genetic differentiation using Arlequin v.3.5.2 (Excoffier and Lischer, 2010). The significance of the pairwise F ST values was determined following 1,000 permutation replications of the dataset.
To investigate genetic structure, we performed neighbourjoining (NJ) phylogeny, principal component analysis (PCA) and ADMIXTURE modeling. The NJ tree was generated to visualize relationships using pairwise F ST genetic distances. MEGA7 (Kumar et al., 2016) was used to construct the NJ tree with the five ecotypes of thin-tailed Desert Sheep and the four sheep breeds from China anchoring at the nodes. PCA was performed using the "pca" option in PLINK and the first two PCs were visualized using GENESIS (Buchmann and Hazelhurst, 2014). Genomic ancestry and admixture were investigated using the unsupervised clustering algorithm implemented in the ADMIXTURE toolkit v1.3 (Alexander et al., 2009). The patterns of population structure were explored by varying the number of assumed ancestral clusters between 2 ≤ K ≤ 8. Five iterations were performed for each K, summarized using CLUMPP (Jakobsson and Rosenberg, 2007) and visualized with GENESIS.

Genome-Wide Scans for Signatures of Divergent Selection
The NJ tree, PCA and ADMIXTURE revealed evidence of broad-and fine-scale genetic structures in the datasets. To detect genomic regions showing divergence between the observed genetic structures, we analyzed the dataset for signatures of divergent selection.
Using the detectRUNS package, we implemented the frequency-based threshold approach to define genome-wide ROH islands in each genetic cluster that was revealed by the NJ tree, PCA and ADMIXTURE. This approach set a percentage of animals within a genetic cluster or a population having a SNP in an ROH region as the threshold. The threshold used in our analysis to select the genomic regions with a high frequency of ROH islands was 50%. Private ROH islands were determined by filtering out homozygous variants in ROHs in a specific genetic cluster, but which could not be found in ROHs of the other genetic clusters. This allowed the detection of either whole or segments of ROHs that were either shared or were private to a genetic cluster. The frequency of ROHs were plotted against their physical genomic positions and visualized with a Manhattan plot.
Signatures of selection were investigated using F ST (Weir and Cockerham, 1984). This approach analyses allele frequency differentiation between populations and detects almost complete or long-term selection signatures. The F ST algorithm was implemented with HIERFSTAT (Goudet, 2005) using a window size of 200 kb and a sliding step size of 60 kb. The window and slide-step sizes were inferred from LD decay patterns. The pairwise F ST values for each SNP in each window between the genetic clusters tested were estimated as: where p1, p2 and q1, q2 are the frequencies of alleles "A" and "a" in the first and second genetic clusters, respectively, and pr and qr are the frequencies of alleles "A" and "a, " respectively, across the tested genetic clusters (Zhi et al., 2018). The F ST values were standardized into Z-scores as follows: where µF ST is the overall average value of F ST and σF ST is the standard deviation (SD) derived from all the windows tested for a given comparison. Supplementary Figure 1 shows the We also investigated signatures of selection using XP-EHH (Sabeti et al., 2007). It assesses haplotype differences between two test populations, and it can detect a wider range of selection scenarios, including selection on standing variation or incomplete sweeps or on-going selection (Sabeti et al., 2007;Innan and Kim, 2008). The test uses the integrated EHH (iHH) of a core SNP in two test populations, rather than two alleles in one test population. The unstandardized XP-EHH statistic is calculated as: where iHHA and iHHB are the integrated EHH of a given core SNP in the A and B test populations, respectively. A large and positive value of XP-EHH at a locus suggests selection in reference population while a negative value suggests selection in the alternate one. We used the software developed by Pickrell and Pritchard (2012) to estimate the unstandardized XP-EHH statistics for all SNPs in each comparison. These were then standardized using their means and variances, and the p-values were estimated from the standard normal distribution (Supplementary Figure 2). For each comparison, we determined the regions under selection based on the threshold P value < 0.001.
To investigate the functional significance of selection signatures, the candidate regions revealed by ROH, F ST , and XP-EHH were identified using the intersectBed function of BedTools software (Quinlan and Hall, 2010). The O. aries reference genome assembly (OAR_v3.0) was used to annotate the candidate regions using the BioMart Tool in Ensembl and NCBI databases. The functional annotation tool in DAVID v6.8 (Huang et al., 2009) was used to perform enrichment and gene ontology (GO) analyses using O. aries as the background. The functions of genes overlapping the candidate regions were also determined from literature.

Genetic Diversity and Differentiation
The average values of H O and H E for the Sudanese thintailed Desert Sheep and the Chinese breeds exceeded 0.300 ( Table 1) (Table 1). This supported a low level of genetic relatedness between the individuals, which were confirmed by the low values of inbreeding (average F ROH = 0.014 and F = 0.009). This validated our sampling strategy which targeted only mature unrelated animals. The Sudanese thin-tailed Desert Sheep had on average longer lengths of ROH, higher F ROH but lower F compared to the Chinese breeds.
We investigated the trends in LD decay over genomic distances ( Figure 1A) and in N E over generation time ( Figure 1B). Across all genomic distance intervals, the Sudanese thin-tailed Desert Sheep showed lower LD compared to the Chinese breeds. At 1,000 generations ago, the Sudanese thintailed Desert Sheep had lower N E compared to the Chinese sheep. However, the thin-tailed Desert Sheep showed an expansion in N E up to around 270 generations ago, after which it started to contract drastically up to the present time. The N E for the Chinese breeds remained stable up to around 500 generations ago, upon which it also started to contract but at a gradual pace to the present time.
Genetic differentiation was assessed by calculating pairwise F ST values between ecotypes and breeds (Supplementary Table 3). The overall highest levels of genetic differentiation were Frontiers in Genetics | www.frontiersin.org To visualize the extent of genetic differentiation, we generated an NJ phylogeny with the ecotypes and breeds at the nodes using the pairwise F ST genetic distance matrix (Figure 2A). The tree revealed a clear separation of the fat-tailed Tan Sheep from the Chinese thin-tailed sheep and the Sudanese thin-tailed Desert Sheep. The latter clustered very close together, suggesting lower genetic differentiation. The HZ was separated from both ZK and QOL, suggesting a genetic divergence within the Chinese thintailed sheep. ZK and QOL had the lowest value of F ST between them. An NJ phylogeny for the five ecotypes of Sudanese thintailed Desert Sheep was generated to obtain a better picture of the extent of their genetic relationships ( Figure 2B). It showed that AL was genetically differentiated from BU, HA, KA, and SH.
We used individuals as the taxonomic units and performed a PCA to ascertain the divergence revealed by the NJ tree. In the first instance, all the study individuals were included in the PCA, and PC1 and PC2 explained 26.50 and 8.09% of the total genetic variation, respectively ( Figure 2C). PC1 separated the Chinese breeds into two groups showing a marked concordance with their tail type and geographic origin. One group included the Tan Sheep (fat-tailed sheep from the lowlands, Ningxia Province) while the other group comprised HZ, QOL, and ZK (thin-tailed sheep from the alpine high-altitude, Qinghai Province). This grouping corresponded to the NJ phylogeny. Although HZ appeared to be separated from both QOL and ZK, its divergence was not as distinct as on the NJ tree. PC2 separated the Chinese breeds from the five ecotypes of Sudanese thin-tailed Desert Sheep which, as in the NJ tree, clustered close together. To further investigate the clustering pattern in the latter, we performed the PCA but excluding the Chinese breeds. It revealed three outlier individuals of KA and a slight separation of AL from the other ecotypes ( Figure 2D). Given the likelihood that the three outliers could be masking the divergence of AL, we performed another PCA while excluding them. It resulted in a clear divergence of AL from the other ecotypes ( Figure 2E). Given this result, we excluded the three outliers from subsequent analyses as their extreme divergence could not be explained.
We used the ADMIXTURE tool to investigate genome architecture and complement the NJ phylogeny and PCA. Running ADMIXTURE using all the study individuals resulted in the lowest CV error at K = 3 ( Figure 3A). At this K, three gene pools were observed ( Figure 3B). The first comprised the thintailed Desert Sheep from Sudan, the second was made up of the thin-tailed sheep from China (ZK, QOL, and HZ) and the third is exclusive to the Chinese fat-tailed Tan Sheep. We suggest that this reveals the broad-scale genetic structure defining the dataset. Considering the results of the NJ phylogeny and PCA, which showed AL to be genetically divergent and a slight separation of HZ from both ZK and QOL, we investigated the ADMIXTURE patterns at 4 ≤ K ≤ 8 ( Figure 3B). At K = 4, HZ diverges from ZK and QOL and this divergence is retained up to K = 8. At K = 5, AL diverges from the other ecotypes of Sudanese thintailed Desert Sheep by its unique genetic background which is also retained up to K = 8. As the pattern of genetic structure remains consistent from 5 ≤ K ≤ 8 and it corresponds to the clusters revealed by the NJ phylogeny (Figures 2A,B) and PCA ( Figures 2C,E), we suggest that K = 5 represents the fine-scale genomic structure in the study individuals. We therefore refer the two genetic clusters in the Sudanese thin-tailed Desert sheep as SD_G1 (AL) and SD_G2 (BU, HA, KA, and SH). We also classify the Chinese breeds into three genetic clusters, i.e., CN_G1 (Tan Sheep), CN_G2 (ZK and QOL), and CN_G3 (HZ).

Signatures of Selection
Based on these genetic clusters, we used ROH, F ST and XP-EHH to investigate the genetic signatures of their divergence. We performed the ROH analysis to identify ROH segments that were specific to SD_G1 or SD_G2. The F ST and XP-EHH were used for comparative selection signature analyses that contrasted the two genetic clusters of the Sudanese thin-tailed Desert Sheep with each other and with two of the Chinese clusters. That was, for SD_G1, the following comparative analyses were performed: SD_G1 verses SD_G2, CN_G1, or CN_G2. A similar comparative analysis was performed for SD_G2. We excluded CN_G3 from these analyses because its genome showed an admixed profile. All the candidate regions and genes identified by each approach for each comparison involving SD_G1 and SD_G2 are shown in Supplementary Tables 4-10.
The ROH analysis identified 107 ROH regions, overlapping 286 genes, that were specific to SD_G1 (Figure 4A), and 78 ROH regions, spanning 281 genes, that were specific to SD_G2 ( Figure 4B). In total, 88 ROH regions were common between SD_G1 and SD_G2. The most significant ROH region which was common to SD_G1 and SD_G2 occurred on OAR3 (10,700,001-11,800,000 bp) and spanned 19 genes ( Table 2). For the Chinese groups, 146 and 69 ROH islands spanning 257 and 43 genes were specific to CN_G1 ( Figure 4C) and CN_G2 ( Figure 4D), respectively.
The XP-EHH identified 32 candidate regions in the comparative analysis between SD_G1 and SD_G2 ( Figure 5A). These regions spanned 73 putative genes and the most significant region was the same as that identified by ROH on OAR3 ( Table 2). Between SD_G1 and CN_G1, XP-EHH identified 34 candidate regions ( Figure 5B) and against CN_G2, it identified 46 candidate regions ( Figure 5C). These regions spanned 83 and 225 genes, respectively. The most significant region with CN_G1 was on OAR6 (85,447,324-85,695,088 bp) and spanned six genes while the one with CN_G2 occurred on OAR14 (34,400,001-34,600,000 bp) and spanned 10 genes ( Table 2). The XP-EHH analysis between SD_G2 and CN_G1 ( Figure 5D) or CN_G2 ( Figure 5E) identified 31 and 46 candidate regions which spanned 83 and 208 genes, respectively. The most significant regions occurred on OAR10 (78,200,001-78,500,000 bp) and OAR14 (34,400,001-34,600,000 bp) spanning 7 and 10 putative genes ( Table 3), respectively.
The F ST analysis identified 73 candidate regions with extreme allele frequency differentiation between SD_G1 and SD_G2 ( Figure 6A), which spanned 288 putative genes. The most significant region was on OAR6 (69,700,001-70,000,000 bp) spanning two genes ( Table 2). Between SD_G1 and CN_G1, F ST revealed 24 regions spanning 56 genes ( Figure 6B) and between SD_G1 and CN_G2, it identified 38 regions spanning 89 genes ( Figure 6C). For SD_G2, F ST identified 35 and 33 candidate regions that differentiated the group with CN_G1 and CN_G2, respectively. These regions spanned 107 and 68 genes, respectively, and the most significant regions occurred on OAR20 (16,646,531-16,720,282 bp) and OAR3 (129,700,001-129,900,000 bp) spanning seven and two genes ( Table 3), respectively.
In general, there were 48 candidate regions in SD_G1, overlapping 206 genes that were simultaneously identified by a combination of at least two methods (ROH, F ST and/or XP-EHH) and/or two comparative analyses (SD_G1 verses SD_G2/CN_G1/CN_G2) ( Table 2). Among these regions, 39 were identified by ROH with either F ST or XP-EHH, four by all the three approaches and five by F ST and XP-EHH. For SD_G2, 47 candidate regions overlapping 209 genes across 18 autosomes were identified by a combination of at least two approaches and/or two comparative analyses (Table 3). Among these regions, 24 were identified by a combination of ROH with either F ST or XP-EHH, two by all the three approaches and 19 by F ST and XP-EHH. The three approaches (ROH, XP-EHH, and F ST ) identified a total of 697 and 765 putative genes that overlapped with candidate regions identified in SD_G1 (Table 4) and SD_G2 (Table 5), respectively. These genes were used for GO and enrichment analyses for each group and the top-most significant GO terms and KEGG pathways are shown in Tables 4, 5.
The "hyaluronan metabolic process (GO:0030212)" is the most common GO term across the comparisons. Since it is associated with a wide range of functions (Stern et al., 2006), it may be relevant to the two groups of Sudanese thin-tailed Desert sheep.

DISCUSSION
The history of indigenous livestock and their physiological, anatomical and genetic responses to natural and artificial selection is at the core of their diversity (phenotypic and genetic) and resilience. Here, we present findings of the analysis of genomic variation in the indigenous African long-legged thintailed Desert Sheep from Sudan. The overall average H O and H E for the Sudanese thin-tailed Desert Sheep exceeded 0.300, suggesting high genetic variation. The values for the individual ecotypes are, however, similar to those reported in Barki sheep (Kim et al., 2016), an indigenous breed from a hot desert environment in Egypt, are higher than the values reported in Ethiopian sheep (Edea et al., 2017), but fall within the range reported in sheep raised by South African smallholder farmers (Molotsi et al., 2017) and in New Zealand breeds (Brito et al., 2017). The level of diversity in the four Chinese breeds analyzed here is similar to that of the Sudanese thin-tailed Desert Sheep in spite some of the breeds, such as the Tan Sheep and HZ having been exposed to artificial selection. Ascertainment bias and deliberate avoidance of inbreeding in the Chinese breeds could explain this result. The former should however be seen from the context that the Ovine Infinium R HD SNP BeadChip carries assays for 606,006 loci with an average spacing of around 5 kb across the genome. These loci were selected from groups that differed in their MAF across 75 animals from an international panel of 34 domestic sheep breeds and two wild species of sheep (Bighorn and Thinhorn) (Anderson et al., 2014). The chip was also validated using 288 samples, generating 99% average call rates across SNPs and animals, and a call rate repeatability of 99.9978%.
The lack of stringent artificial selection coupled with random mating in the Sudanese thin-tailed Desert Sheep may explain their high levels of genetic diversity but low levels of genetic differentiation and inbreeding. The former may be enhancing their fitness and could be responsible for their adaptive resilience to the desert environments where they are raised.
We investigated demographic dynamics by assessing the trends in LD over genomic distances and N E over the past 1,000 generations. All the samples analysed showed a rapid decay in LD within the first 300 kb. From around 0.1 Mb, the Chinese breeds had higher r 2 values, which most likely reflected an attempt at   This r 2 value ranked below the minimum threshold range of 0.33 ≤ r 2 ≤ 0.80 that is meaningful for GWAS (Ardlie et al., 2002;Carlson et al., 2004). Much denser marker coverage may thus be required for association analysis in the thin-tailed Desert Sheep       and the Chinese breeds. Besides its importance for mapping traits, LD allows the estimation of N E over generation time (Pritchard and Przeworski, 2001), acts as a significant genetic parameter for understanding population dynamics and can act as a measure of long-term performance of a population with regard to diversity and inbreeding (Fernández et al., 2005), and is useful for evaluating the risk status of populations/breeds (FAO, 1998;Duchev et al., 2006). A contraction in N E from 270 and 500 generations ago was revealed in both the Sudanese thintailed Desert Sheep and the Chinese breeds, respectively. These contractions appeared not to have resulted in a concomitant reduction in genetic diversity. The contraction in the Chinese breeds may be associated with the start of their establishment as distinct breeds while that in the Sudanese thin-tailed Desert Sheep is difficult to explain. However, whole-genome sequence analysis has revealed declines in N E in Ethiopian, Libyan and Sudanese sheep (Ahbara, 2019) and in Ethiopian goats based on the analysis of 50K SNP genotype data (Tarekegn et al., 2020). Mbole-Kariuki et al. (2014) also reported a bottleneck event in East African shorthorn Zebu cattle from western Kenya. The reduction in population sizes of the other African sheep and goat populations falls within the time period of the shrinkage in Sudanese thin-tailed Desert Sheep, while that of the East African shorthorn Zebu cattle started around 240 years ago. Historical fluctuations in climatic patterns resulting in cycles of favorable and deteriorating conditions in the continent (Verschuren, 2007) have been used to explain the fluctuations in N E . The NJ phylogeny, PCA and ADMIXTURE allowed us to reveal the underlying genetic structure and divergence in the datasets analysed. We hypothesized that the ADMIXTURE pattern at K = 3, which was supported by NJ phylogeny and PCA, revealed an underlying broad-scale genetic structure as it showed (i) a separation of African sheep from the Chinese breeds, (ii) a separation of fat-tailed sheep (Tan breed) from both African and Chinese thin-tailed sheep, and (iii) different genetic backgrounds in both the African and the Chinese thin-tailed sheep. While the first result suggest genetic divergence that has most likely arisen due to reproductive isolation between sheep in different continents coupled with genetic drift, the second result confirm the existence of differences in genetic makeup of fattailed and thin-tailed sheep. A similar result based on the analysis of microsatellites was reported previously between African fattailed and thin-tailed sheep (Muigai, 2003). Furthermore, the divergence of the fat-tailed Tan Sheep from the other Chinese breeds can be due to artificial selection for lamb pelt and lustrous white curly fleece in the Tan sheep. The clear divergence between the Sudanese thin-tailed Desert Sheep and the Chinese thintailed sheep suggest at least two possibilities: (i) the domestication of at least two autosomal gene pools of thin-tailed sheep in the Fertile Crescent and their subsequent independent dispersal westwards to Africa and eastwards to China, and (ii) the dispersal of one gene pool followed by genetic divergence driven by genetic drift due to reproductive isolation and natural selection driving adaptation to low altitude hot arid environments in Africa or high altitude alpine arid environments in China. Although mitogenome analysis has identified two waves of sheep migration comprising three maternal lineages across eastern Eurasian (Lv et al., 2015), the study did not include sheep from Africa and therefore the first scenario is difficult to ascertain. We therefore favor the second scenario given that recent analysis of autosomal genomic data has provided compelling evidence for climatemediated selection pressure driving genetic divergence in sheep (Lv et al., 2014) while differences in precipitation affecting feed availability has also been shown to drive variation in natural selection at a global scale (Siepielski et al., 2017).
A combination of NJ phylogeny, PCA and ADMIXTURE (K > 5) results also revealed what we hypothesized to be a fine-scale genetic structure among the individuals analyzed. The analyses revealed at least two distinct genetic clusters in the Sudanese thin-tailed Desert Sheep; one which was specific to AL and another to the four remaining ecotypes. The analyses also identified two genetic clusters in the Sudanese thin-tailed sheep from China: one cluster was exclusive to HZ and another occurred in QOL and ZK and at low frequency in HZ. Given that the five ecotypes of thin-tailed Desert Sheep are all derived from a low altitude hot arid environment while the Chinese thintailed breeds are adapted to an alpine/sub-alpine high-altitude rangeland, this fine-scale genetic structure was unexpected. It  SLC23A2, LPGAT1, PREP, OXT, DNPH1, EFTUD2,  HOXA10, EPB41L4B, XPO1, RASSF2, SOX15, MMP28,  TUBB1, XPO5, MEIOC, POLH, PNISR, PDGFRA, USP45,  ATP1B2, MYT1, THOC7, ENAH, GLTPD2, RNF167,  NEIL2, IRF4, RUFY3, SLFN14, ZNF438, PRKD1,  CACTIN, TP53, KIZ, DTL, ATG4D, GTF2A1, ITIH4,  SPTBN5, PCNA, UBA6, TUBGCP2, SATB2, SRF,  PRICKLE2, CACNA1A, DBF4B, PSMB10, CENPC,  EPB41L5, ORC4, FXR2, ATXN7, PSMB1, STXBP5,  TSNAXIP1, DTD2, GINS1, CDKN2D, KLHDC3, NOP58,  TGFB1  may hint at a complex genome architecture in the thin-tailed sheep because it cannot be explained by differential selection for adaptation. It is, however, likely that artificial selection may be driving the divergence in the Chinese thin-tailed sheep. With a common genomic background observed in QOL and ZK and the same occurring at a low frequency in HZ, it suggests that HZ has been gradually diverging from QOL and ZK. HZ has a predominantly black coat while QOL and ZK have mainly white coats with some black to brown patches around heads and legs. QOL and ZK also occur in close geographic proximity and can thus cross mate while HZ is isolated and farmers keeping this breed avoid mating it with other breeds, so as to maintain its distinct black coat, genetic purity and recognition. We investigated molecular signatures of selection to gain insights on the divergence of SD_G1 and SD_G2 genetic clusters of the Sudanese thin-tailed Desert Sheep. For this reason, we used three approaches and contrasted SD_G1 and SD_G2 with each other and with two of the genetic clusters: CN_G1 and CN_G2 observed in Chinese sheep. We used the ROH analysis to provide evidence for selection within a cluster. If such signatures overlapped with those identified by F ST and/or XP-EHH and were observed in at least two comparative analyses, then we took this to be a reliable evidence of positive selection in the specific genetic cluster. We therefore limited our discussion to the putative genes found within the candidate regions that overlapped between at least two approaches and in more than two comparative analyses involving the two genetic groups observed in the Sudanese thin-tailed Desert Sheep (Tables 2, 3).
Based on our criteria, nine candidate regions that overlapped between at least two approaches and were observed in at least two comparative analyses were identified in SD_G1. These candidate regions were located on OAR3, 5, 6 (two regions), 10, 13, 18 (two regions), and 26 (one region) ( Table 2). Within these candidate regions, we found genes that are associated with functions relating to adaptation to abiotic and biotic factors and morphology. For instance, the most significant region on OAR6 (Figures 7, 8) spanned six genes, one of which was AMBN (ameloblastin), a candidate gene for gastrointestinal nematode resistance in sheep (Atlija et al., 2016;Berton et al., 2017). The PDGFRA gene occurred in another candidate region on OAR6 (Figures 7, 8). It has been reported to be a key gene in cytokine signaling and was observed to be amongst genes in biological pathways that are involved in host immune responses against parasitic infections (Benavides et al., 2015). PDGFRA has also been reported to be tightly associated with the dominant white coat color in the pig (Johansson et al., 1992;Moller et al., 1996). Nazari-Ghadikolaei et al. (2018) also found significant association between PDGFRA and KIT with white coat color in Markhoz goat from Iran. The white coat color predominated in AL based on field observations made by the first author during sampling. PDGFRA has also been linked to fat deposition in Libyan long fat-tailed sheep (Mastrangelo et al., 2019) while CAMK4 found on OAR5 has been associated with tail morphology and fat deposition in sheep (Moradi et al., 2012;Ma et al., 2018;Ahbara et al., 2019;Li et al., 2020). These results likely suggest that divergent selection for parasite resistance, coat color, differential fat deposition and tail morphology may explain the divergence of SD_G1. With the lack of phenotypic data, our suggestions would need to be confirmed with more detailed studies that include an assessment of appropriate phenotypes.
Our criteria also revealed four candidate regions that overlapped between at least two approaches and in at least two comparative analyses in SD_G2 (Table 3). These regions were observed on OAR3 (three regions) and OAR8 (one region). The region on OAR8 was the strongest and it spanned three genes, FAM120B, PSMB1, and PDCD2. FAM120B has been suggested to be a potential candidate for twinning rate in humans (Painter et al., 2010) and in lowly ovulating mammals (Vinet et al., 2012). The PSMB1, which was the top candidate gene at this region, has been associated with functional adaptation of the transcriptome to mastitis-causing pathogens in mammary gland of livestock (Loor et al., 2011). The SOCS2, which was found in one of the candidate regions on OAR3, has been linked with body weight and milk production, and an increased susceptibility to inflammation of mammary glands (Rupp et al., 2015). It is important to take note that most pastoral societies in Africa prefer larger sized animals with higher milk production as offspring from such animals are thought to thrive better in hot arid environments. This indirect preference for such individuals may be responsible for this selection signal. SOCS2 has also been reported to play important roles in key adaptive phenotypes in sheep, including general immune response following infection Al Kalaldeh et al., 2019) and resistance to gastrointestinal nematode (Haemonchus contortus) (Estrada-Reyes et al., 2019). The proteasome PSMB7 and the heat shock protein HSPA5, both found in the region on OAR3, were reported to be upregulated during blastocyst implantation in hamsters (Lei et al., 2014). These results suggest that the divergence of SD_G2 could be associated with differential resistance to bacterial infections, productive and reproductive performance. However, as for SD_G1, this finding will need to be investigated further with much detailed analyses of individuals with relevant phenotypes.
The candidate region on OAR3 (10,700,001-11,800,000 bp) differentiated both SD_G1 ( Table 2) and SD_G2 (Table 3) from both CN_G1 and CN_G2 (Figure 9) and was therefore of interest. The top significant window at this region spanned five genes (NR6A1, NR5A1, ADGRD2, PSMB7, and NEK6) and the topmost significant position was close to NR5A1. The expression of NR5A1 drives Leydig cell differentiation from progenitor cells (Barsoum and Yao, 2010), thus initiating steroidogenesis (Martin and Tremblay, 2010). In mice, NR5A1 has been shown to be essential in the formation of pituitary, gonad and adrenal glands (Luo et al., 1994). Another region that differentiated SD_G1 and SD_G2 from both CN_G1 and CN_G2 occurred on OAR14. Two genes were present in this region, one of which was AP1G1. Proteomic analysis of sheep uterus has revealed the role of AP1G1 in prolificacy (La et al., 2020). Whether the function of NR5A1 and AP1G1 suggested inherent genetic differences in fertility and prolificacy between the Sudanese thin-tailed Desert Sheep and the Chinese breeds is difficult to say in the absence of appropriate phenotype data. Strong evidence suggests that NR6A1 is a strong candidate gene underlying vertebrae number in domestic pigs (Mikawa et al., 2007;Rubin et al., 2012) and the number of thoracic vertebrae in domestic sheep (Li et al., 2019). From a phenotypic perspective, this result is interesting as it suggests that the Sudanese thin-tailed Desert Sheep can be differentiated from the Chinese thin-tailed sheep based on the number of vertebrae. Indeed, the tail of Sudanese sheep consists of 23 caudal vertebrae (Ahbara, 2019), while that of Chinese sheep comprises less than 18 (Zhi et al., 2018).
The African long-legged thin-tailed sheep is raised in desert environments where they are exposed year-round, to complex interacting biophysical stressors, including high temperatures, physical exhaustion, direct solar radiation, feed and water scarcity. In this context, the revelation of the GO biological term "hyaluronan metabolic process (GO:0030212)" in four out of the six comparisons involving SD_G1 and SD_G2 is relevant. Hyaluronan (HA) comprises a major component of the extracellular matrix (ECM) in vertebrates and is a straight chain glycosaminoglycan which mediates diverse functions depending on molecular size and tissue concentration, both of which are regulated by the balance between its biosynthesis and catabolism (Hascall and Esko, 2015). HA occurs virtually in all vertebrate tissues and fluids, but skin, the first defense against environmental insults, is its largest reservoir containing more than 50% of the total body HA. The HA content of dermis is far greater than that of epidermis, and accounts for most of the 50% of the total body HA in skin. HA has excellent water retention ability and remarkable tissue hydration capacity, and at high concentrations, as found in the ECM of dermis and epidermis, it regulates water balance and osmotic pressure, functions as an ion exchange resin and regulates ion flow (Stern, 2010). HA also functions in the reepithelization process due to several of its properties, including being an integral part of the ECM of basal keratinocytes, the major constituent of epidermis, its free-radical scavenging function and its role in keratinocyte proliferation and migration (Tammi et al., 1989). It has been observed that the content of HA increases in the presence of retinoic acid (vitamin A) (Tammi et al., 1989) and the effects of retinoic acid against UV-induced skin damage may be correlated, at least in part, with an increase in skin HA content, giving rise to increased tissue hydration. It has been suggested that the free-radical scavenging property of HA contributes to protection against repeated exposure to the sun's UV radiation (Tuhkanen et al., 1998;Averbeck et al., 2007). The rapid turnover of HA in tissues suggests tightly controlled modes for modulating steady state levels of HA and this is important because rapid increases are required in situations of extreme stress (Kobayashi et al., 2020). Therefore, the ability to provide immediate high HA levels acts as a survival mechanism for vertebrates and may explain the rapid rates of HA turnover under basal conditions. Furthermore, HA plays a role in innate immunity. Although it binds to CD44, there is evidence showing that HA degradation products transduce inflammatory signal through toll-like receptor 2 (TLR2), TLR4 or both in macrophages and dendritic cells. Thus, the HA metabolic process may be facilitating the adaptation to desert environments in African long-legged thin tailed sheep.

CONCLUSION
In this study, we analyzed the genetic diversity, structure and signatures of selection in the African thin-tailed Desert Sheep sampled from Sudan. We included one breed of fattailed and three breeds of thin-tailed sheep from China for comparative genomic analysis. We found high levels of genetic diversity but low levels of genetic differentiation among the five ecotypes of Sudanese thin-tailed Desert Sheep. The analysis also revealed a broad-and fine-scale genetic structures in the sheep analyzed, suggesting that these would need to be accounted for in genome-wide association analysis in the discovery of the genetic basis of important traits and in breeding program design. Selection signature analysis identified candidate regions that could potentially differentiate the two genetic clusters observed in the African thin-tailed Desert Sheep from the two genetic clusters observed in the Chinese thin-tailed sheep. These regions spanned a set of potential candidate genes associated with traits of adaptive, production and reproduction significance as well as morphological differentiation. While our study provides a foundation for understanding the genome structure and dynamics of African indigenous sheep, it reveals findings that could form the basis of studies that combine genomic and phenomic approaches in the quest to understand the genome architecture of indigenous livestock.

DATA AVAILABILITY STATEMENT
The data presented in the study are deposited in the figshare repository, accession number doi: 10.6084/m9.figshare.1478 5278.v1.

ETHICS STATEMENT
The blood samples from the Sudanese thin-tailed Desert Sheep were collected after consent was granted by flock owners and local administration officials in Sudan. No further permissions were required from the ethics committee of the Organization of Veterinary Service, Government of Sudan. All animal experiments in this study were fully approved by the Animal Care and Use Committee of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (IAS-CAAS) with the following reference number: IASCAAS-AE-03, on September 1, 2014.

AUTHOR CONTRIBUTIONS
QZ, YM, and JM conceived, designed, and supervised the study and genotyped the Chinese breeds. JM, MR, and AH genotyped the Sudanese thin-tailed Desert Sheep. FE organised the sampling in Sudan. AAh and AAb analyzed the data with support from HB, RI and LX. AAb, AAh, J-LH, and JM wrote and revised the manuscript. All authors read and approved the final version.