Whole-genome resequencing reveals genomic variation and dynamics in Ethiopian indigenous goats

Ethiopia has about 52 million indigenous goats with marked phenotypic variability, which is the outcome of natural and artificial selection. Here, we obtained whole-genome sequence data of three Ethiopian indigenous goat populations (Arab, Fellata, and Oromo) from northwestern Ethiopia and analyzed their genome-wide genetic diversity, population structure, and signatures of selection. We included genotype data from four other Ethiopian goat populations (Abergelle, Keffa, Gumuz, and Woyto-Guji) and goats from Asia; Europe; and eastern, southern, western, and northern Africa to investigate the genetic predisposition of the three Ethiopian populations and performed comparative genomic analysis. Genetic diversity analysis showed that Fellata goats exhibited the lowest heterozygosity values (Ho = 0.288 ± 0.005 and He = 0.334 ± 0.0001). The highest values were observed in Arab goats (Ho = 0.310 ± 0.010 and He = 0.347 ± 4.35e−05). A higher inbreeding coefficient (FROH = 0.137 ± 0.016) was recorded for Fellata goats than the 0.105 ± 0.030 recorded for Arab and the 0.112 ± 0.034 recorded for Oromo goats. This indicates that the Fellata goat population should be prioritized in future conservation activities. The three goat populations showed the majority (∼63%) of runs of homozygosity in the shorter (100–150 Kb) length category, illustrating ancient inbreeding and/or small founder effects. Population relationship and structure analysis separated the Ethiopian indigenous goats into two distinct genetic clusters lacking phylogeographic structure. Arab, Fellata, Oromo, Abergelle, and Keffa represented one genetic cluster. Gumuz and Woyto-Guji formed a separate cluster and shared a common genetic background with the Kenyan Boran goat. Genome-wide selection signature analysis identified nine strongest regions spanning 163 genes influencing adaptation to arid and semi-arid environments (HOXC12, HOXC13, HOXC4, HOXC6, and HOXC9, MAPK8IP2), immune response (IL18, TYK2, ICAM3, ADGRG1, and ADGRG3), and production and reproduction (RARG and DNMT1). Our results provide insights into a thorough understanding of genetic architecture underlying selection signatures in Ethiopian indigenous goats in a semi-arid tropical environment and deliver valuable information for goat genetic improvement, conservation strategy, genome-wide association study, and marker-assisted breeding.


Introduction
Goats (Capra hircus) are of economic, nutritional, and cultural significance to humankind.Their domestication is believed to have happened in the Near East around 11,000 years ago from a mosaic of wild bezoar populations (Capra aegagrus) (Zeder, 2008;Daly et al., 2018).Human migrations and trade dispersed goats to diverse environments, and through adaptation, they integrated successfully into these environments.
The detection of genomic regions under positive selection can give insights into phenotypic evolution driven by different breeding objectives or adaptation to local environments (Andersson and Georges, 2004).Over the last decade, many genome-wide selection signatures have been detected in different goat breeds that are associated with production (Wang et al., 2016), liter size (Lai et al., 2016;Guang-Xin et al., 2019;Tao et al., 2021;Wang et al., 2021), adaptation (Kim et al., 2016;Wang et al., 2016), disease resistance (Lee et al., 2016), cashmere fiber (Li et al., 2017) and multiple traits (Guo et al., 2018).However, there are no such studies on the Ethiopian indigenous goats except Berihulay et al. (2019) whohowever, analyzed only two Ethiopian goat populations, Abergelle and Begait.In general, the studies have demonstrated how positive selection acting on complex traits has changed the genetic composition of domestic goats.
Ethiopia is home to ~52 million goats (CSA, 2021), a great majority of which are indigenous genotypes.They occur in large flock sizes kept by pastoralists in arid and semi-arid environments, while small flock sizes maintained by agro-pastoralists are widely distributed in the highlands (Abegaz et al., 2014).Using morphometric traits, FARM-Africa (1996) classified Ethiopian indigenous goats into 13 populations.However, this classification may have inadvertently classified genetically similar populations as separate entities.
Among the indigenous goat populations of Ethiopia, Arab, Fellata, and Oromo are known to be well-adapted to Ethiopia's semi-arid region of Benishangul-Gumuz (Getinet et al., 2005) and represent the primary livestock species and breeds raised in the area.The region's altitudinal landscape ranges from 550 m to 2,500 m above sea level (Sinmegn et al., 2014), and its rich ancient and recent human socio-economic, political, and cultural history could have impacted the genome landscape of the indigenous goats.For instance, the two main ethnicities in Benishangul-Gumuz (Berta/Arab and Gumuz) were historically closely associated with neighboring areas of Sudan (where they extend) and, to a lesser extent, with the Ethiopian highlands.Various trade routes dating to the Axumite era (100-940 AD) met in Benishangul-Gumuz, where goods, including gold, livestock, iron, coffee, ivory, and honey, were exchanged.The Axumite kingdom was a trading empire with its hub in Eritrea and northern Ethiopia, which at times extended across most of present-day Eritrea, northern Ethiopia, Western Yemen, and parts of eastern Sudan (Fattovich, 2019).Furthermore, between 1979 and the mid-1980s, there was a relocation of a large population of inhabitants from the Ethiopian highlands (Amhara) to Benishangul-Gumuz due to drought and famine (Mulatu, 1991;Gebru, 2009).The region is also known for its intense solar radiation, feed and water scarcity, and tsetse infestation (Duguma et al., 2015).We, therefore, hypothesize that these historical, contemporary, and environmental events could have impacted the genetic makeup of indigenous goats from Benishangul-Gumuz, which could have contributed to their genetic differentiation, admixture with other Ethiopian indigenous goats, and adaptation to such an environment.
In this study, using whole-genome sequence data, we investigated i) the genetic diversity and population structure of three Ethiopian indigenous goat populations while mapping their genetic profiles to other goats from Africa, Asia, and Europe and ii) signatures of selection associated with adaptation to arid and semi-arid environments, immune response, and production and reproduction.For the latter, we present an assessment of the genome structure between Arab, Fellata, and Oromo goats descended from a semi-arid tropical environment in Ethiopia and that of Tibetan goats from a cool highland temperate environment in China and exposed to contrasting selection pressures, natural versus artificial, between the following population pairs: Arab vs. Tibetan, Fellata vs. Tibetan, and Oromo vs. Tibetan.

Animals and whole-genome sequencing
Thirty (30) unrelated animals representing three northwestern Ethiopia indigenous goat populations (Arab, Fellata, and Oromo) were sampled for this study.Whole blood was collected from each individual by puncturing the jugular vein using EDTA-coated vacutainer tubes while adhering to the guidelines on animal welfare and care of the Ministry of Livestock and Fisheries of the Federal Democratic Republic of Ethiopia.Genomic DNA was extracted from whole blood using DNeasy ® Blood and Tissue kit (https://www.QIAGEN.com,accessed on 11 September 2019) following the manufacturer's protocol with a few modifications.The integrity of the extracted DNA was checked in a 1% agarose gel.The concentration and purity of the DNA were determined by spectrophotometer readings at 260 nm and 280 nm, respectively (DeNovix Inc., Wilmington, DE, USA).Whole-genome sequencing at a depth of 10X was done with the NovaSeq 6000 ™ platform at Tianjin Noozhiyuan Technology Co., Ltd.
Genotype data from 60 animals representing ten goat populations were obtained from the VarGoats project (available from the European Nucleotide Archive (ENA), project number PRJEB37507, accessed on 16 August 2021) in FASTQ format and from which we extracted filtered genotype data.We included Ethiopian (Abergelle, Keffa, Gumuz, and Woyto-Guji) and Kenyan (Boran) goat populations to represent East African goats, Malawian (Thylo) to represent South African goats, Malian (Guera) to represent West African goats, Moroccan (Unknown) to represent North African goats, French (Saanen) to represent European goats, and Chinese (Tibetan) to represent Asian goats.A detailed description of the environmental characteristics of the geographic areas of the study populations is given in Supplementary Table S1.

Filtering process
The multisample raw VCF of each goat population was filtered by VariantRecalibrator v4.1.7.0 of GATK.Two training resources, one with true sites ("known = false, training = true, truth = true, prior = 15.0") and the other with non-true sites ("known = true, training = false, truth = false, prior = 2.0") used for recalibration were dbSNP variants obtained from Ensembl v105.The variant call annotations (for SNPs and InDels) DepthOfCoverage (DP), QualByDepth (QD), RMSMappingQuality (MQ), MappingQualityRankSumTest (MQRankSum), ReadPosRankSumTest (ReadPosRankSum), FisherStrand (FS), and StrandOddsRatio (SOR) were used for VariantRecalibrator.Based on the SNP tranches (Supplementary Figure S1), no falsepositive variants were observed in the 90 tranche.Therefore, the 99 tranche was included to increase the sensitivity of variant discovery.We considered the highest tranche (99.9-100) as a false positive and excluded it.The remaining variants (SNPs and InDels) were then recalibrated at the truth sensitivity filter level (tranche) of 99 using ApplyVQSR v4.1.7.0 of GATK.In general, the entire filtration process resulted in a high confidence set of 35,161,094 biallelic autosomal SNPs and 3,737,445 InDels.The final set of SNPs was annotated using variant effect predictor (VEP) v104.3.

Genomic diversity analyses
Genomic diversity for each population was analyzed using various metrics, including the proportion of polymorphic SNPs (Pn), nucleotide diversity (π), and genomic expected (He) and observed (Ho) heterozygosity.Estimates of Pn-the fraction of total SNPs that displayed both alleles-were calculated as the proportion of SNPs with minor allele frequency (MAF) greater than 0.01.The π analysis (Nei and Li, 1979) is a method that uses SNPs to calculate the average difference between any two nucleotide sequences in a population.The π values were computed based on the sliding window method within 100-kb windows with 50 kb step size along the autosomes using the VCFtools v0.1.15(Danecek et al., 2011).The Ho was calculated as the proportion of total heterozygous SNPs to the total number of sites counted in each genome, based on the ARS1 C. hircus reference genome.The observed and expected heterozygosity was calculated using the "--het" option of PLINK v1.9 (Purcell et al., 2007) for each genome and then averaged for each population.
The average pairwise genetic distance (D) between individuals within a population was calculated in PLINK v1.9 (Purcell et al., 2007).The average proportion of alleles shared between two individuals was calculated using the "--genome" command line in PLINK v1.9 as: D ST IBS 2 +(0.5 X IBS 1) /N, where IBS 1 and IBS 2 represent the number of loci that share either one or two alleles that are identical by state (IBS) in pairwise comparisons between individuals, respectively, and N is the number of loci tested.The genetic distance between all pairwise combinations of individuals was calculated as D = 1 − D ST .
The runs of homozygosity (ROHs) were estimated with PLINK v1.9 by invoking the "--homozyg" option.ROHs are uninterrupted stretches of homozygous genotypes common among individuals within a population (McQuillan et al., 2008;Marras et al., 2015).The degree of ROH variation between populations was characterized based on differences in the length and number of ROH fragments between populations.The following PLINK parameters and thresholds (Purcell et al., 2007) were applied to define an ROH region: (i) minimum number of SNPs in ROH or in sliding window = 50, (ii) minimum length of ROH = 100 kb, (iii) minimum number of missing SNP in the ROH = 1, (iv) minimum allowed density of SNPs within a run = 1 SNP/100 kb, (v) minimum number of heterozygous SNPs in each ROH = 1, and (vi) maximum gap between consecutive homozygous SNPs = 1 Mb.ROHs enable reliable estimation of the level of inbreeding.We estimated the ROH-derived genomic inbreeding coefficient (F roh ) following McQuillan et al. (2008) as F roh L roh /L auto , where L roh is the total length of ROH of each individual in the genome, and L auto is the length of the goat autosomal genome (~2400 Mb) (Kumar et al., 2018).To compare the ROH length between populations, four length categories were allocated: 100-150 Kb, >150-250 Kb, >250-400 Kb, and >400 Kb (classified as ROH 100-150Kb , ROH 150-250Kb , ROH 250-400Kb , and ROH >400Kb, respectively).For each ROH length category, we summed all ROH per animal and averaged this per population.
In order to investigate the potential of our approach, we focused on two length classes: from 100 kb to 400 kb to investigate ancient events and >400 kb to address recent events.

Population genetic structure analyses
We used three methods to detect the genetic differentiation and structure among the studied goat populations: a principal component analysis (PCA) to visualize patterns in relationships between individuals using the "--pca" command line in PLINK v1.9 (Purcell et al., 2007);Treemix v.1.13 (Pickrell and Pritchard, 2012) to construct a maximum likelihood (ML) tree and visualize population splits and the directionality of gene flows between populations; and ADMIXTURE v.1.3(Alexander et al., 2009) to perform the ancestry proportion of individual genomes.
The PCA was performed at three levels: global, East African, and Ethiopian.To summarize the relationship among individuals, we plotted the first two eigenvectors.For this analysis, we filtered out SNPs with minor allele frequency (MAF) < 0.01, leaving 27,728,833 SNPs that were used.The graphical display of PCA results (PC1 and PC2) was visualized with GENESIS (Buchmann and Hazelhurst, 2014).
In the TreeMix analysis, the number of migration events (m) varied between one (1) (migration between two populations) and thirteen (13) (migration between all populations).The TreeMix plotting script was used to visualize the trees and the proportion and direction of gene flow events between the 13 goat populations with R. The number of gene flow events that best fit the data was identified using the fraction of the variance in the sample covariance matrix explained by the model covariance matrix (Pickrell and Pritchard, 2012).To evaluate the confidence of the trees' edges and nodes, 1000 bootstrap replicates were performed.Before running migration events, it is important to define the position of the root using prior information about known out-groups (Pickrell and Pritchard, 2012).We used the Tibetan goat population as the root (-root) in our analyses.Furthermore, to test for admixture among breeds and to assess the statistical significance of gene flow events, the THREEPOP and FOURPOP functions (implemented in TreeMix) were run to calculate the f3 and f4 statistics, respectively (Ahbara et al., 2019).For the f3 test (A; B, C), the putative admixture of a target population (A) is tested against two source populations (B, C).A significantly negative value of the resulting Z-score indicates A is admixed.The f4 test (A, B; C, D) investigates the tree topology of four populations.A significantly positive Z-score suggests gene flow between populations A and C or B and D that surpasses any gene flow between populations A and D or B and C. A significantly negative Z-score, however, indicates gene flow between populations A and D or B and C that surpasses any gene flow between A and C or B and D. Standard errors were estimated using blocks of 500 SNPs.
For the ADMIXTURE analysis, we used 8,344,942 autosomal SNPs that were retained after we pruned the 27,728,833 SNPs used in PCA for linkage disequilibrium (LD).ADMIXTURE was run for K = 2 to K = 13.A five-fold cross-validation (CV) procedure was applied to determine the optimal number of clusters (K).GENESIS (Buchmann and Hazelhurst, 2014) was used to display the ADMIXTURE OUPUTS.

Genomic selection signature analysis
We implemented two methods for the selection signature analysis: i) genetic differentiation based on F ST (Weir and Cockerham, 1984) and ii) within-group pooled heterozygosity (H P ) (Rubin et al., 2010).A sliding window approach was used to perform the F ST analyses with VCFtools (v.0.1.15)using a 100-kb sliding window size (≥30 SNPs) and a 25-kb step size according to the method used previously (Lai et al., 2016;Guo et al., 2018) following the equations: F ST = 1 − (p1q1 + p2q2)/2prqr; where p1, p2 and q1, and q2 are the frequencies of alleles A and a in the first and second group of the test populations, respectively, and pr and qr are the frequencies of alleles A and a, respectively, across the tested groups (Zhi et al., 2018).The F ST values were standardized into Z-scores as follows: Z(F ST ) Z(Fstµ(F ST ))/σ(F ST ); where µ is the mean, and σ is the standard deviation of F ST derived from all the windows tested between test groups.Finally, the top 1% of windows showing both extremely high F ST values (the top 1% of Z (F ST ) distributions) and extremely low H P scores (the bottom 1% of ZH P distributions) (Guo et al., 2018) were proposed to be under selection.
The H P values were also estimated using a window size of 100 kb and a sliding step size of 25 kb (Guo et al., 2018) with the formula: H P = ( 2 nMaj nMin)/( nMaj + nMin) 2; Where nMaj and nMin are the sums of major and minor allele frequencies, respectively, for all the SNPs in the 100-kb window.The values for the H P calculated for each window size were then Z-transformed using the equation: Z (HP) = (Hpµ(Hp))/(σ(Hp)), where µ is the mean and σ is the standard deviation of H P .Because sex chromosomes and autosomes are subjected to different selective pressures and have different effective population sizes, we calculate the ZHp for autosomes of specific breeds only.Putative genomic regions that overlapped between these two approaches were defined as candidate selective regions.

Candidate gene analysis
Based on genome annotation, a gene was deemed to show evidence of being under selection if it overlapped with an outlier genomic window based on both ZF ST and ZHp.Hence, we annotated the candidate genomic regions using the Ensemble BioMart tool (https://www.ensembl.org/biomart,accessed on 16 February 2024) based on the Caprine reference genome (ARS1).Using C. hircus as a background species, we performed functional enrichment analysis of all the annotated genes using Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/content.jsp?file=release.html,accessed on 16 February 2024) v6.8 (Huang et al., 2009).Categories with the threshold of adjusted p-value <0.05 after the Bonferroni correction were defined as significantly enriched terms and pathways.To infer gene functions, we consulted the NCBI database (https://www.ncbi.nlm.nih.gov) and reviewed the literature.

Genome sequence mapping and SNP calling
The SNP and InDel summary statistics of the sequence parameters for each population are shown in Supplementary Tables S2, S3, respectively.The highest number of SNPs (18,266,925) and InDels (2,033,758) were detected in the Unknown (UNK) population from Morocco, while the lowest number of SNPs (14,389,837) and InDels (1,739,360) were identified in Saanen (SAN; Europe) breed and Gumuz (GUM; Ethiopia) breed, respectively.The average number of SNPs per sample ranged from 6,591,579 (Keffa (KEF); Ethiopian) to 8,866,137 (Woyto-Guji (WGU); Ethiopian).The average number of InDels ranged from 660,074 (Keffa; KEF) to 976,619 (Abergelle (ABR); Ethiopian).The heterozygosity to homozygosity (het/hom) ratio ranged from 1.03 (Keffa (KEF)) to 1.62 (Unknown (UNK)).A comparison with the C. hircus dbSNP database revealed that 29%-35% of the SNPs and 78%-82% of the InDels were not present in the database.

Runs of homozygosity (ROHs)
The frequency of ROHs was calculated from homozygous sequences across each genome for the four genome length categories (ROH 100-150Kb , ROH 150-250Kb , ROH 250-400Kb , and ROH >400Kb ) analyzed here (Figure 1).A total of 57,910 ROH segments were identified across the 78 individuals.Thyolo had the highest number (10,979) of ROH segments, while Saanen had the lowest number (909) of ROH segments (Table 2).The ROHs comprised mostly (58%) the shorter (ROH 100-150Kb ) length category, while a very small proportion (~4%) of the ROH length category was detected in the longer ROH segment (ROH >400Kb ).

Population structure
The genetic relationship between individuals was analyzed with PCA, which we performed at three levels: global, East African, and Ethiopian.At the global level, the PCA separated the 13 populations into four groups that corresponded to their geographic origin, viz, 1) Asian, 2) European, 3) South and East African, and 4) North and West African.In this PCA, PC1 and PC2 explained 13.32% and 10.66%, respectively, of the variation in the entire genetic data (Figure 2).PCA plots for the first two components (PC1 and PC2, the respective variations explained in brackets) for the global goat populations.
PCA plots for the first two components (PC1 and PC2, the respective variations explained in brackets) for African goats.
Frontiers in Genetics frontiersin.org07 Sheriff et al. 10.3389/fgene.2024.1353026At the second level, which aimed to obtain a clearer picture of the genetic variation among 11 African populations (Figure 3), the PCA revealed three genetic clusters, viz, 1) East African, 2) North and West African, and 3) South African.In contrast to the global PCA, there is the separation of East African goats from the South African.In this dataset, PC1 and PC2 accounted for 17.93% of the total variation.
The phylogenetic tree generated with TreeMix (Figure 5) revealed four genetic groups in the 13 goat populations.The groups are similar to the genotype groups seen in the PCA plot (Figure 2).As expected, it clearly distinguished Saanen and Tibetan goats.However, it clustered the South and East African goats in one group, separate from the North and West African goats, which were in another group.It also revealed two gene flow events, one from Saanen to Unknown and the other from Thyolo to Guera.Some of the populations, including Saanen, Thyolo, and Tibetan, are shown to have long branches corresponding to the amount of genetic drift.
The f3 test failed to reveal any gene flow events between the populations analyzed (Supplementary Table S4).A complex pattern of gene flow between the study populations, which cannot be explained by a three-way model, may explain this result.The f4 test highlighted possibilities of gene flow among various populations.The highest Z-scores (>|54|) were between Boran and Abergelle and between Unknown and Saanen (Supplementary Table S5).The lowest Z-scores (>|54|) were between Boran and Woyto-Guji and between Saanen and Abergelle.
The graphical summary of the results of ADMIXTURE for 2 ≤ K ≤ 13 is presented in Figure 6.The cross-validation (CV) error was lowest at K = 6 (Supplementary Figure S2), suggesting the presence of six optimal genetic backgrounds in the 13 study goat populations.At this K-value, Saanen and Tibetan each show separate genetic backgrounds, suggesting they are genetically distinct.This separation is also revealed by the PCA (Figure 2) and Treemix (Figure 5).The Guera and Unknown breeds share the third genetic background, but the Unknown breed shows a small proportion of the Saanen background in its genome, which corresponds with the gene flow results from Saanen to Unknown in the TreeMix (Figure 5).Thyolo has the fourth genetic background, while Gumuz, Woyto-Guji, and Boran share the fifth genetic background.The sixth genetic background occurs in Abergelle, Keffa, Fellata, Arab, and Oromo.Some results are noteworthy.At K = 5, Thyolo, Guerra, and Unknown share a common genetic background, but Thyolo is separate with a different background at K = 6-13.The TreeMix also separates Thyolo and shows that it is closer to the populations from Ethiopia and Kenya (Figure 5).Similarly, the PCA completely separates Thyolo, which corresponds with the ADMIXTURE results for K = 6-13 (Figure 3).At K = 6, the background that defines Gumuz, Woyto-Guji, and Boran is observed in Abergelle, Arab, and Keffa, while Fellata shows the presence of the Tibetan background and the background defining Guera and Unknown in its genome.At K = 7, Gumuz diverges from Boran and Woyto-Guji, but its genetic background is now observed in Abergelle, Arab, Keffa, and Oromo.At this K-value, Fellata shares a background with Abergelle and Oromo goats but is separate from Keffa and Arab (Figure 6).This separation is also revealed by both PCA (Figure 4) and TreeMix (Figure 5).Abergelle, on the other hand, clusters with Arab, Keffa, and Oromo in both PCA and TreeMix, but ADMIXURE shows Abergelle with a FIGURE 4 PCA plots for the first two components (PC1 and PC2, the respective variations explained in brackets) for the East African goats.

Selection signatures-F ST approach
A total of 98,660 chromosome windows were assessed for each pairwise comparison (Arab vs. Tibetan, Fellata vs. Tibetan, and Oromo vs. Tibetan).The genome-wide distributions of the standardized FST values across the genomes for each goat population are shown in Supplementary Figure S3.The average FST values were 0.14, 0.11, and 0.13 in Arab, Fellata, and Oromo goats, respectively, suggesting moderate differentiation among the three populations, and this provides insights into their common genetic backgrounds and possible gene flow.Using the top 1% outlier windows as a cutoff threshold (as described earlier) in each population, ~987 windows were assessed per population.Accordingly, genomic regions with high ZF ST (ZF ST ≥ 3.20 (range = 3.20-8.22),3.29 (range = 3.29-7.96),and 3.24 (range = 3.24-7.34);corresponding F ST ≥ 0.40, 0.35, 0.40) in the Arab vs. Tibetan, Fellata vs. Tibetan and Oromo vs. Tibetan, respectively, were defined as selection signatures (Figure 7).Based on these criteria, 987 genomic regions were detected for each of the Arab, Fellata, and Oromo populations (Supplementary Tables S6-S8).Among the three populations, there was variation in the distribution of the regions across the genome.In the Arab and Oromo populations, chromosome 2 showed a strong selection signal, whereas in the Fellata population, chromosomes 8 and 18 showed stronger sweeps than the other chromosomes (Figure 7).

Selection signatures-Hp approach
We also calculated the Z-transformed pooled heterozygosity (ZHp) in 100-kb sliding windows and a 25-kb step size to detect selection sweeps in the three Ethiopian indigenous goat populations.Accordingly, 98,573 windows were detected per population.The overall average Hp values across all the windows were 0.19, 0.19, and 0.18 in the Arab, Fellata, and Oromo goats, respectively (Supplementary Figure S4).Similarly, only outliers falling within the bottom 1% (~986 windows) with low ZHp values (ZHp ≤ −2.73, −2.59, and −2.72; corresponding Hp ≤ 0.08, 0.09, and 0.08 in the Arab, Fellata, and Oromo goats, respectively, were considered signatures of selection.Accordingly, ~983 genomic regions were detected in each population (Supplementary Tables S9-S11). Figure 8 displays the Manhattan plot of the ZHp values for several comparisons in each population.In both the Arab and the Fellata goats, chromosome 15 was shown to have the highest signs for selection signatures, while chromosome 14 was detected in the Oromo population.

Overlapping selection signature regions, genes identified, and enrichment analysis
Based on the overlap of the top 1% ZF ST and the bottom 1% ZHp, 250, 174, and 285 putative selection signature regions were detected for Arab, Fellata, and Oromo goats, respectively (Supplementary Tables S12-S14).Supplementary Tables S15-S17 show a set of functional genes that are novel and reported after annotation of those overlapping genomic regions for Arab, Fellata, and Oromo goat populations, respectively.A total of 206 genes within 65 putative selection signature regions were detected to be associated with Arab goats (Supplementary Table S15).Similarly, 107 genes within 42 candidate genomic regions were identified in Fellata goats (Supplementary Table S16), and 195 genes within 63 genomic regions were identified in Oromo goats (Supplementary Table S17).In total, 508 genes were detected within 170 regions for the three goat populations (Supplementary Tables S15-S17).Among the 170 candidate regions, nine regions ADMIXTURE plot of the studied goat populations in a global context for 2 ≤ K ≤ 13 (EA-G1: East African Group 1; EA-G2: East African Group 2; NWSA: North, West and South African; AS: Asian; EU: European).
Based on the annotation, of the 163 genes, 125 were significantly (p ≤ 0.05, Bonferroni correction) enriched, and we found nine biological process (BP) terms, five cellular component (CC) terms, six molecular function (MF) terms, and one KEGG pathway (Supplementary Tables S18-S21).

Discussion
In the present study, we investigated the genome architecture of three northwestern populations of Ethiopian indigenous goats and benchmarked them against other goat populations from eastern, western, northern, and southern Africa, as well as two exotic breeds, Saanen and Tibetan, from Europe and Asia, respectively.The overall average Ho and He exceeded 0.322, and the within-population genetic variation (D) was above 0.278, suggesting the study populations are highly genetically diverse.The values for Ho and He observed in this study are within the range of global goat diversity (Colli et al., 2018) and close to those reported in Sudanese (Rahmatalla et al., 2017) and Pakistani (Kumar et al., 2018) goats and in the Egyptian Barki goat breed (Kim et al., 2016).The higher variability observed within the Ethiopian goat populations could be attributed to uncontrolled mating that occurs as a result of the common practice of flocks utilizing communal grazing and watering points.The absence of artificial selection, high levels of admixture in these populations, and the introduction and crossbreeding of exotic goats into Ethiopia can also be possible explanations for the increased heterozygosity.Artificial selection within a population may benefit from high variation within populations, especially in areas where community-based goat breeding programs (CBBP) are practiced.CBBP, in which artificial selection occurs within a population, provides a good framework for the implementation of genomic selection in smallholder production systems (Mrode et al., 2018;Rekik et al., 2021).
Runs of homozygosity (ROH) are two contiguous identical by descent (IBD) genomic segments (Gibson et al., 2006) that arise from an increased level of relatedness between individuals within a population or through positive selection (Kijas, 2013).Estimates of ROHs can be used to assist with the interpretation of genomic inbreeding and give insights into population history (Purfield et al., 2012;Zavarez et al., 2015).The latter is particularly important for African indigenous livestock, which are characterized by a lack of written pedigree data (Kosgey et al., 2006).According to Purfield et al. (2012), short ROHs are most likely correlated to ancient inbreeding or potential ancient bottlenecks, whereas long ROHs are more likely associated with relatively recent inbreeding.In the present study, the Ethiopian indigenous goat populations showed their majority of ROHs in the short (100-150 Kb) length category, which is in agreement with the results obtained for other goats (Kim et al., 2016;Brito et al., 2017;Onzima et al., 2018;Islam et al., 2019).Manhattan plots for selection sweep analysis (A) between Arab vs. Tibetan, (B) between Fellata vs. Tibetan, and (C) between Oromo vs. Tibetan goat populations performed using the standardized population differentiation (ZFST) approach.The horizontal line represents the arbitrary threshold for ZFst.
The accumulation of ROHs in the short-length category indicates that the study goats could have been initially established by small founding populations but were not particularly affected by recent inbreeding.
A combination of PCA, ADMIXTURE, and TreeMix tools provided an insight into the genetic structure of the three indigenous goat populations from northwestern Ethiopia and referenced other Ethiopian indigenous goat populations as well as goat populations from Africa and one each from Europe and Asia.The ADMIXTURE tool revealed two distinct genetic clusters.At K = 5, Gumuz and Woyto-Guji were grouped together, while Abergelle, Arab, Fellata, Keffa, and Oromo were in a different cluster and showed some degree of admixture.This result was supported by the TreeMix result.Our finding on these two genetic groups mirrors the previous findings from mitochondrial DNA analyses of 13 Ethiopian goat populations, which identified two haplogroups (A and G) (Tarekegn et al., 2018), suggesting the presence of two deep ancient ancestries in Ethiopia.However, based on the current dataset, it is difficult to infer whether the two groups introduced/arrived in Ethiopia together or independently.The Gumuz and Woyto-Guji populations are geographically isolated (e.g., Gumuz is located in northwestern while Woyto-Guji is located in southern Ethiopia), but they clustered together and were separate from other Ethiopian goat populations.This could be due to their unique genetic compositions and similarity of production environments.We found that the heterozygosity values for Gumuz and Woyto-Guji were very similar (Ho = 0.334 ± 0.025 and 0.335 ± 0.009 for Gumuz and Woyto-Guji, respectively).This was further confirmed by comparable inbreeding coefficient and number of ROHs.The goats also inhabit similar agro-ecology (semi-arid), and both of them are kept by agro-pastoralists.The clustering of Arab, Fellata, and Oromo goats in one group is attributed to geographical proximity.Home tracts of the three goat populations tend to overlap, which may facilitate ease of flock exchange between farmers and favor gene flow among the populations.In general, the study goat populations are characterized by a high level of admixture and a lack of phylogeographic structure, and this agrees with Tarekegn et al. (2018), Luikart et al. (2001), and Naderi et al. (2007;2008), who reported similar findings in different goat breeds.
We did find a total of 170 overlapping genomic regions in the three populations, spanning 508 candidate genes, by combining the two approaches.Comparable results have been reported for livestock species from similar environments (Kim et al., 2016;Mwacharo et al., 2017;Ayalew et al., 2023).Nine of 170 regions were identified as the strongest signals and covered 163 functional genes.We speculate that these 163 genes represent past and/or on-going selection in the studied goat populations.Many of the 163 candidate genes were associated with diverse physiological, molecular, and cellular processes and pathways (Supplementary Tables S18-S21).This shows that adaptation to semiarid tropical environmental stressors such as heat, solar radiation, physical exhaustion, feed and water scarcity, parasites, and others is complex and may involve many genomic regions and genes with pleiotropic activities.It also demonstrates the intricacy of adaptation, Manhattan plots performed using the standardized pool heterozygosity (ZHp) approach for each 100-kb sliding window with a 25-kb step size across all autosomes in the (A) Arab, (B) Fellata, and (C) Oromo goat populations.The horizontal line represents the arbitrary threshold for ZHp.
Frontiers in Genetics frontiersin.org12 Sheriff et al. 10.3389/fgene.2024.1353026involving numerous biological processes and quantitative trait loci, each of which has a small but cumulative effect on the overall phenotypic expression.Among the significant GO terms, the top three that are related to tropical environment adaptations include anterior/posterior pattern specification (GO:0009952), positive regulation of natural killer cell proliferation (GO:0032819), and positive regulation of natural killer (NK) T cell proliferation (GO:0051142) (Supplementary Table S18).
Based on their biological functions and information from published studies, several genes that have been reported before and are possibly responsible for the important traits in goats and other domestic livestock were presented.

Thermo-tolerance genes
Numerous inherent genetic endowments in the Ethiopian indigenous goat populations could be harnessed for better adaptation in semi-arid tropical environments.We identified some of these genetic biomarkers, notably the HOXC-cluster (homeobox genes) (HOXC12, HOXC13, HOXC4, HOXC6, and HOXC9) and MAPK8IP2 (Supplementary Table S18), that are associated with heat stress.The HOXC-cluster was present in Arab and Oromo goat populations and was also pinpointed by GO analysis (GO:0009952), revealing the term anterior/posterior pattern specification.Onzima et al. (2018) reported HOXC12 and HOXC13 genes for their involvement in anterior/posterior pattern specification in the Sebei goat breed of Uganda.The genes can also regulate essential traits like keratin and hair follicle differentiation in goats (Wu et al., 2009), sheep (Sander and Powell, 2004), and cattle (Taye et al., 2017;Ayalew et al., 2023).More specifically, the HOXC13 gene has been reported to influence skin thickness and number of hair follicles in animals.Skin, being the intermediate between the animal body and the surrounding environment, influences thermoregulation positively (Alfonzo et al., 2016) and plays a major role in aiding the adaptation of animals to heat stress.Cattle with relatively thicker skin, such as the thermo-tolerant Bos indicus, exhibit better thermoregulation than cattle with thinner skin, like the heat-sensitive Bos taurus (Alfonzo et al., 2016).The present study identifies HOXC13 as one of the biomarkers for thermo-tolerance, which could further aid in long-term breeding goals towards developing agro-ecological zone-specific goat breeds in the study area.The other gene, MAPK8IP2, is identified in the candidate genomic region of the Fellata goat population, and it has been found to be involved in different aspects of thermo-tolerance.This gene has been shown to interact with MAPK8IP1 (Yasuda et al., 2000).The latter is involved in regulating the reaction of cells to heat stress, which raises the transcription activity of multiple heat stress-responsive genes that regulate various processes, such as cell survival, proliferation, and apoptosis in Holstein cows involved in milk production under heat stress (Sigdel et al., 2019).When cells are subjected to heat stress, they release reactive oxygen species (ROS), which result in cellular necrosis and, ultimately, cell death.Interestingly, gene MAPK8IP1 is involved in suppressing heat stressinduced ROS production and cellular apoptosis (Li et al., 2018).

Immune response genes
Indigenous goats are known to be well-adapted to various environmental stressors, including disease.A member of interleukins (IL18) and TYK2 were present in one of the candidate regions identified in Arab and Oromo goat populations.Interleukins are expressed by leukocytes (Maggio et al., 2006) and are likely involved in the activation of the immune response of goats (Amiri et al., 2023) and chickens (Susta et al., 2015;Chen et al., 2016;Truong et al., 2016).In a study conducted on Malabari goats during heat stress, IL18 was identified as a reliable immunological marker that aids in assessing heat stress-mediated immune response alterations (Rashamol et al., 2019).Similarly, Madhusoodan et al. (2019) reported a significant reduction in hepatitic IL18 gene expression during heat stress in Salem black goats (Madhusoodan et al., 2019).This gene is considered an important inflammatory marker for quantifying the impact of heat stress on the immune system in goats (Bagath et al., 2019).Similarly, TYK2 plays a vital role in the innate and acquired responses of humans (Lu et al., 2015), including inflammatory conditions resulting from viral infections and autoimmune diseases.Some other important genes identified in the present study, known to be associated with immune response, are members of ICAM, such as ICAM3 in Arab and Oromo goat populations and subfamily members of ADGRG, including ADGRG1 and ADGRG3 in Fellata goats.ICAM3, the gene constitutively expressed on the surface of leukocytes (Xiao et al., 2013), is important in generating an immune cell response (Montoya et al., 2002) through its facilitation of interactions between T cells and dendritic cells (Svajger et al., 2010).The other gene, ADGRG1, plays a role in immune regulation (Lala and Hall, 2022) and is expressed in cytotoxic lymphocytes, including natural killer (NK) cells (Della Chiesa et al., 2010).ADGRG3 has been shown to exert robust effects on B cell development (Wang et al., 2013).Genetic deletion of ADGRG3 reduces macrophage migration into white adipose tissue (Shi et al., 2016).The fact that ADGRG3 is robustly expressed in multiple immune cells is fascinating with regard to the recent revelation that ADGRG3 can be activated by glucocorticoids (Ping et al., 2021), as glucocorticoids are known to exert powerful effects on the physiology of many different cell types in the immune system.

Genes associated with production and reproduction
We detected genes like RARG and DNMT1 in Arab and Oromo goats, which are associated with production and reproduction.RARG plays an important role in milk production, body size, and kidding in Liaoning cashmere goats (Chen et al., 2022).The expression of this gene has been suggested to be required for the development of limb buds and skeletal growth in dairy cows (Bionaz et al., 2015).Furthermore, RARG was identified as the most important master regulator of quantitative trait loci (QTL) for milk production in the F 2 population of German Holstein × Charolais crossed cows (Brand et al., 2016).This gene was also associated with litter size in pigs (Rothschild et al., 1997) and the development of bovine embryos (Mohan et al., 2001).In other species, such as mice, RARG was reported to be of functional significance involved in spermatogenesis (Aurore et al., 2012), formation of normal alveoli and alveoli elastic fibers in the lung (McGowan et al., 2000), hematopoietic development (Purton, 2007), and correct formation of the axial skeleton, including anteriorization of the cervical and thoracic vertebrae (Wendling et al., 2001).In contrast, its absence in mice resulted in growth deficiency (Lohnes et al., 1993), bone mass reduction (Green et al., 2015), reduced chondrocyte proliferation, decreased expression and deposition of proteo1glycans (Williams et al., 2009), and male sterility (Lohnes et al., 1993).The other gene, DNMT1, consistently appears to be involved in reproduction (Kay et al., 2018;Schulz et al., 2018;Bewick et al., 2019;Amukamara et al., 2020;Washington et al., 2021) and egg production and egg viability in B. tabaci.DNMT1 knockdown affected testis size and structure (Washington et al., 2021).

Conclusion
We used whole-genome sequence data to investigate the level of genetic diversity, population structure, and signatures of selection in three Ethiopian indigenous goat populations.We observed high within-breed genetic diversity, low genetic differentiation, a high level of admixture, and lack of phylogeographic structure in the studied goat populations.The genomic data also identified several potential candidate genes possibly under selection, including adaptation to a hot environment (homeobox genes and MAPK8IP2), immune response (IL18, TYK2, ICAM3, and ADGRG subfamilies), and production and reproduction (RARG and DNMT1).Our study is important in the design of goat genetic improvement programs in view of the current and future predicted effects of climate change.It also provides a foundation for future studies to investigate the genome architectures of different ruminant species coexisting in a similar environment.

FIGURE 5
FIGURE 5TreeMix maximum likelihood phylogenetic tree showing the relationships among the 13 goat populations.Horizontal branch lengths are proportional to the amount of genetic drift that has occurred along that branch.The scale bar on the left shows 10 times the average standard error (s.e.) of the entries in the sample covariance matrix.Two migration edges between populations are shown with arrows pointing in the direction of the recipient group and colored according to the ancestry percentage received from the donor.

TABLE 1
Estimates of genetic diversity parameters for each of the 13 populations analyzed in this study.

TABLE 2
Statistics of ROH observed in the study goat populations under different length classes.