Capture Sequencing to Explore and Map Rare Casein Variants in Goats

Genetic variations in the four casein genes CSN1S1, CSN2, CSN1S2, and CSN3 have obtained substantial attention since they affect the milk protein yield, milk composition, cheese processing properties, and digestibility as well as tolerance in human nutrition. Furthermore, milk protein variants are used for breed characterization, biodiversity, and phylogenetic studies. The current study aimed at the identification of casein protein variants in five domestic goat breeds from Sudan (Nubian, Desert, Nilotic, Taggar, and Saanen) and three wild goat species [Capra aegagrus aegagrus (Bezoar ibex), Capra nubiana (Nubian ibex), and Capra ibex (Alpine ibex)]. High-density capture sequencing of 33 goats identified in total 22 non-synonymous and 13 synonymous single nucleotide polymorphisms (SNPs), of which nine non-synonymous and seven synonymous SNPs are new. In the CSN1S1 gene, the new non-synonymous SNP ss7213522403 segregated in Alpine ibex. In the CSN2 gene, the new non-synonymous SNPs ss7213522526, ss7213522558, and ss7213522487 were found exclusively in Nubian and Alpine ibex. In the CSN1S2 gene, the new non-synonymous SNPs ss7213522477, ss7213522549, and ss7213522575 were found in Nubian ibex only. In the CSN3 gene, the non-synonymous SNPs ss7213522604 and ss7213522610 were found in Alpine ibex. The identified DNA sequence variants led to the detection of nine new casein protein variants. New variants were detected for alpha S1 casein in Saanen goats (CSN1S1∗C1), Bezoar ibex (CSN1S1∗J), and Alpine ibex (CSN1S1∗K), for beta and kappa caseins in Alpine ibex (CSN2∗F and CSN3∗X), and for alpha S2 casein in all domesticated and wild goats (CSN1S2∗H), in Nubian and Desert goats (CSN1S2∗I), or in Nubian ibex only (CSN1S2∗J and CSN1S2∗K). The results show that most novel SNPs and protein variants occur in the critically endangered Nubian ibex. This highlights the importance of the preservation of this endangered breed. Furthermore, we suggest validating and further characterizing the new casein protein variants.

INTRODUCTION native goat breeds can be found in Rahmatalla et al. (2017). The Saanen goats used in this study were imported to Sudan from the Netherlands. While Bezoar ibex is considered as the ancestor of current domesticated goat breeds, Nubian ibex are goats from mountainous regions in Sudan. Alpine ibex are used for comparison with Nubian ibex, which both live in high mountain areas.
High-density capture sequencing was used to identify genetic variants in the casein genes. Since we did not examine proteins, we predicted protein variants from DNA polymorphisms using bioinformatics tools. The identification of such variations on the DNA and protein levels is the first step for subsequent association studies, which will provide further information about the effects of specific protein variants on milk characteristics and offer their application for breed improvement. Moreover, the information can be used for conservation decisions and further elucidation of the evolution of Capra.
sequences (version LWLT01) (Bickhart et al., 2017) available at the National Center for Biotechnology Information (NCBI) 1 . The amplified regions covered 5,000 bp before the transcription start sites and 1,000 bp after the 3 -UTR region of each casein gene. For the generation of sequencing libraries, 500 ng of genomic DNA was sheared by sonication (Covaris M220, Covaris, Woburn, MA, United States) for 75 s (20% duty factor, 200 cycles per burst) and further processed with the Accel-DNA 1S kit (Swift Biosciences, Ann Arbor, United States) according to the manufacturer's instructions. The resulting whole genome libraries were barcoded and pooled in equimolar amounts and hybridized to the tiling array. Briefly, the libraries were hybridized for 65 h at 65 • C, washed, and eluted with nucleasefree water for 10 min at 95 • C. The eluted DNA was concentrated in a vacuum centrifuge, amplified by PCR (10 cycles with 98 • C for 15 s, 65 • C for 30 s, and 72 • C for 30 s) and purified with Ampure XP beads.

Data Analysis
Fastq sequencing files were demultiplexed based on their barcodes and reads were trimmed using trimmomatic, after which the trimmed reads were aligned to the LWLT01 reference genome using BWA. The BAM files containing the raw aligned reads per sample formed the input for our variant calling pipeline. Variants were called using BCFtools, and the resulting raw single nucleotide polymorphism (SNP) calls were filtered using the varFilter tools from the vcfutils package to remove all off-target calls (Danecek et al., 2011).
Two settings in the SNP calling phase were adjusted: read quality (-q) in the mpileup step was increased to 30 (default, 0), and varFilter in the vcfutils step was called with a minimum depth (-d) of 10 reads (default: 2) in a sample before a SNP was to be called. All other settings for SNP filtering were left to their default values. This means that if a SNP is called in a sample, at least 10 reads with a read quality of 30 were available to support the detected SNPs. All SNPs have a minimal QUAL score in the resulting VCF file of 300. The average read depth across all SNPs was visualized and is available in Supplementary Figure 2. Polymorphisms were validated visually using the Integrative Genomics Viewer (IGV) (Robinson et al., 2011). 1 https://www.ncbi.nlm.nih.gov/ The positions of the identified sequence variants presented in this paper are based on the LWLT01 genome. Known sequence variants were annotated based on the Single Nucleotide Polymorphism database (dbSNP, build 143) and are reported in this paper with their rs identification number and novel SNPs with their European Variant Archive (EVA) ss identifier.
Subsequently, novel high confident SNPs were combined and annotated using a custom R script that relies on the "seqinr" and "Biostrings" R packages. In short, the GFF3 file containing the gene, exon, and cds locations for goat was combined with the SNPs from the VCF file obtained from SNP calling using R 4.0.2. Amino acid changes were determined by translating codons using the "Biostrings" package (Pagès et al., 2020) using "The Standard Genetic Code" codon table.
The effects of novel non-synonymous SNPs on protein function were predicted using the PROVEAN tool (Choi and Chan, 2015) 2 and are available in Supplementary Table 2. The peptide chain response to hydrolysis and cleavage was tested for the amino acid sequences with mutated amino acids in casein proteins by using the PeptideCutter program at ExPASy Bioinformatics Resource Portal 3 . The isoelectric focus (IEF) information for the new protein variants in the kappa casein was not experimentally tested, but predicted using the ExPASy tool whether the variant belonged to the A IEF or the B IEF group in the kappa casein protein (Gasteiger et al., 2003).

DNA Sequence Variants in the Casein Gene Regions
In total, 647 SNPs were detected in the analysis of 80,685 bp obtained sequences within the casein gene regions of CSN1S1, CSN2, CSN1S2, and CSN3 when compared with the goat reference sequence. Most of the detected variants were located in introns (76.82%), followed by variants in the upstream gene region (14.84%) and the non-synonymous variants (3.40%). The remaining SNPs were synonymous variants (2.01%) and variants located in the 3 -UTR (2.01%) and the 5 -UTR (0.92%)

CSN1S1
The reference sequence for CSN1S1 (accession no. NC_030813) represents the alpha S1 casein variant CSN1S1 * A (XP_017904616), which includes the signal peptide. Sequence analysis of 22,807 bp revealed 226 SNPs with 9.9 SNPs per 1,000 sequenced base pairs. Among the identified SNPs, six were non-synonymous ( Table 2), seven synonymous (Table 3), and 32 SNPs were located in the upstream region, four in the 3 -UTR, and 177 in introns (Supplementary Table 3).
Among the non-synonymous SNPs, one novel SNP was detected in wild Alpine ibex. This SNP was located at position CHR6:85984154 (ss7213522403, exon 7) and led to the amino acid substitution Ile44Val in the mature alpha S1 casein protein. The known non-synonymous SNPs rs155505536 and rs268293072 were found in all Sudanese breeds, Saanen goats, and Bezoar ibex; rs155505532 segregated in Sudanese breeds and Saanen, and the SNPs rs268293069 and rs655973384 were found in Saanen goats only ( Table 2).
In the CSN1S1 gene, five out of seven synonymous SNPs were novel [ss7213522449 (exon 2), ss7213522443 (exon 12), ss7213522421 (exon 15), ss7213522397 (exon 17), and ss7213522438 (exon 17)]. The SNPs ss7213522443, ss7213522421, ss7213522397, and ss7213522438 revealed synonymous mutations in the codons for the amino acids Lys 102 , Asn 139 , Tyr 165 , and Pro 168 of the mature protein, respectively. The additional synonymous SNP ss7213522449 is located in the codon for the amino acid Leu 6 in the signal peptide. Three out of the five novel synonymous SNPs (ss7213522443, ss7213522421, and ss7213522397) segregated in Sudanese breeds, Saanen, and Bezoar ibex ( Table 3). The novel synonymous SNP ss7213522449 was identified in Nubian ibex and Alpine ibex, while the novel ss7213522438 SNP was found in Nubian and Nilotic goats only. The known synonymous SNP rs672288350 segregated in Sudanese breeds, and SNP rs155505533 was found in Sudanese breeds and Saanen goats ( Table 3).

CSN2
The reference sequence for the CSN2 gene (accession no. NC_030813) represents the beta casein variant CSN2 * C (XP_005681778), which includes the signal peptide. Sequencing of 15,071 bp revealed 109 SNPs with 7.23 SNPs per 1,000 sequenced base pairs. Among the identified SNPs, five were non-synonymous (Table 2), one synonymous (Table 3), and eight SNPs were located in the upstream region, two in the 3 -UTR, and 93 in introns (Supplementary Table 4). Three out of the five non-synonymous SNPs were novel. The novel SNPs ss7213522526 (exon 1), ss7213522558 (exon 1), and ss7213522487 (exon 7) ( Table 2) led to the amino acid substitutions Leu11Val and His17Arg in the signal peptide and Pro148Leu in the mature protein of beta casein, respectively. All these novel SNPs were found in Alpine ibex, the first two ones also in Nubian ibex. The novel ss7213522487 SNP in exon 7 has a predicted deleterious effect on protein function (PROVEAN score = −4.947) using the PROVEAN tool (Supplementary Table 2). The known non-synonymous SNP rs652629715 segregated in most domesticated breeds (except Nilotic and Taggar) and all wild species, and SNP rs155505539 was found in all domesticated breeds and Bezoar and Alpine ibex.
In the CSN2 gene, ss7213522504 (exon 1) was the only novel synonymous SNP in the codon Gln 13 of the signal peptide. It was found in Nubian ibex and Alpine ibex.

CSN1S2
The reference sequence of the CSN1S2 (accession no. NC_030813) represented the CSN1S2 * A variant of the alpha S2 casein protein (XP_013820127), which includes the signal peptide. On average, 8.5 SNPs were detected per 1,000 sequenced base pairs. In the sequence of 22,694 bp, 193 SNPs were found in comparison to the reference sequence. Among them, six were non-synonymous SNPs ( Table 2), 38 were in the upstream region, four in the 5 -UTR, five in the 3 -UTR, and 140 in introns (Supplementary Table 5). In this study, three out of six non-synonymous SNPs were novel. The novel SNPs ss7213522477 (exon 4), ss7213522549 (exon 5), and ss7213522575 (exon 16) caused the amino acid substitutions Phe17Ser, Ile20Thr, and Ser169Asn in the mature alpha S2 casein protein, respectively ( Table 2). All three novel SNPs were found in Nubian ibex. Among the known non-synonymous SNPs, rs640625134 was detected in Taggar and Saanen goats, the SNP rs659163710 was found in all domesticated breeds and wild species, and the SNP rs665830654 segregated in Nubian and Desert goats as well as in Nubian ibex ( Table 2). Although all these three SNPs had entry ID numbers in the Ensembl database, none of them had been assigned to the CSN1S2 gene. Therefore, we also provide here the annotated information of the SNPs rs640625134 (CHR6:86079098T > C, exon 2), rs659163710 (CHR6:86085160G > C, exon 11), and rs665830654 (CHR6:86085714G > A, exon 12), which lead to the amino acid substitutions of Phe4Ser in the signal peptide and Ala119Pro and Glu127Lys in the mature alpha S2 casein protein. No synonymous SNP was detected in the CSN1S2 gene in this study.

CSN3
The reference sequence for CSN3 (accession no. NC_030813) encodes the CSN3 * B kappa casein protein variant (NP_001272516), including the signal peptide. Sequencing of 20,113 bp of CSN3 revealed 119 SNPs compared to the reference sequence with 5.9 SNPs per 1,000 sequenced base pairs. Among the identified SNPs, five were non-synonymous (Table 2), five synonymous (Table 3), and 20 were in the upstream region, two in 3 -UTR, and 87 in introns (Supplementary Table 6).
Interestingly, all five non-synonymous and four out of the five synonymous SNPs reside in exon 4. Two out of the five non-synonymous SNPs were novel. They were identified in Alpine ibex only ( Table 2). The two novel nonsynonymous SNPs ss7213522604 and ss7213522610 led to the amino acid substitutions Ser33Asn and Ser37Thr in the mature kappa casein protein, respectively. The known non-synonymous SNP rs268293109 was found in Taggar and Saanen goats and the SNPs rs268293113 and rs651045868 in Nubian and Desert goats.
In the CSN3 gene, we also detected the novel synonymous SNP ss7213522597 (exon 4) in the codon for the amino acid Asp 20 of the mature protein in Nubian ibex. The known synonymous SNP rs663488235 (exon 3) segregated in Desert, Nilotic, Taggar, and Saanen goats, as well as in Bezoar ibex, the SNP rs155505563 (exon 4) in Nilotic goats, the SNP rs268293107 in Nilotic and Taggar goats, and the SNP rs268293108 in Nubian, Desert, Taggar, and Saanen goats as well as in Nubian and Alpine ibex ( Table 3).

Casein Protein Variants
The identified DNA sequence variants led to the recognition of 18 casein protein variants, including nine new ones: six protein variants in the alpha S1 casein, three in the beta casein, five in the alpha S2 casein, and four in the kappa casein (Figure 2). The frequency of the protein variants differed widely ( Table 4).

Alpha S1 Casein
Based on the DNA sequence information, we identified five amino acid substitutions. These contributed to the detection of six alpha S1 casein protein variants (Table 4). Among them, three protein variants were new and distinct from the CSN1S1 * A reference. A protein variant containing the amino acids Ile 8 , Pro 16 , Glu 77 , and Lys 100 was similar to the protein variant CSN1S1 * C, with the only difference of threonine at position 195 instead of alanine. Therefore, this protein variant was named CSN1S1 * C1. CSN1S1 * C1 was found in Saanen goats. The protein variant containing the amino acids Glu 77 and Lys 100 was identified in Bezoar ibex only and was named CSN1S1 * J. The protein variant containing Val 44 was observed in wild Alpine ibex and named CSN1S1 * K. The known protein variant CSN1S1 * A was found in Sudanese breeds, and in Bezoar, Nubian, and Alpine ibex, CSN1S1 * B2 in Saanen goats only, and CSN1S1 * B3 in Sudanese breeds and Saanen goats, but neither in Bezoar nor in Nubian or Alpine ibex. The CSN1S1 * B3 and CSN1S1 * C1 protein variants always occurred combined with the newly identified synonymous DNA variants G, T, and T in the positions CHR6:85988712, CHR6:85991559, and CHR6:85993377, respectively. In contrast, the same SNPs occurred with the alleles A, C, and C, respectively, in the CSN1S1 * A reference protein.

Beta Casein
The two known CSN2 * A and CSN2 * C variants and a new variant were found for beta casein. The new beta casein variant was detected in Alpine ibex only ( Table 4). This new variant, which carried the amino acids Leu 148 and Ala 177 , was named CSN2 * F. The CSN2 * F protein variant is always linked with allele T of the novel synonymous SNP at position CHR6:86015270 (ss7213522504), leading to the amino acid Gln 13 in the signal peptide. The two known CSN2 * A and CSN2 * C variants were found in Sudanese breeds, Saanen goats, and Bezoar ibex. In addition, CSN2 * C was also found in Nubian ibex ( Table 4).

Alpha S2 Casein
Five protein variants were found in the alpha S2 casein. Four of them are presented here for the first time (

Kappa Casein
Four protein variants were found for kappa casein. One of them was new. This variant was identified in Alpine ibex. The new variant was most similar to the protein variant CSN3 * B, except for positions 33 and 37, where the new variant carried the amino acids asparagine and threonine, respectively. This variant was named as CSN3 * X ( Table 4). The known variant CSN3 * B was fixed in Nilotic goats, Bezoar ibex, and Nubian ibex and was the most common variant in Nubian, Desert, Taggar, and Saanen, goats. The variant CSN3 * K was detected only in Taggar and Saanen goats. The variant CSN3 * N (as named in the new nomenclature by Gautam et al., 2019, but also called CSN3 * M in the study of Kiplagat et al., 2010) was found only in Nubian and Desert goats ( Table 4).

DISCUSSION
Understanding the effects of different protein variants on human health and nutrition can be used for the selection and development of niche products.
The new alpha S1 casein variant CSN1S1 * J detected in Bezoar ibex has the amino acid substitutions Gln77Glu and Arg100Lys (compared to CSN1S1 * A). Since glutamine and glutamic acid are both polar, and arginine is similar to lysine (both contain long and flexible side chains with a positively charged end), we do FIGURE 2 | Amino acid sequence alignments of all casein protein variants detected in this study. (A) Alpha S1 casein, (B) beta casein, (C) alpha S2 casein, and (D) kappa casein. The reference protein variants were obtained from GenBank (CSN1S1: XP_017904616, 214 amino acids; CSN2: XP_005681778, 257 amino acids; CSN1S2: XP_013820127, 223 amino acids; CSN3: NP_001272516, 192 amino acids). The amino acid sequences for the casein variants were aligned and compared with the reference sequences using the multiple sequence alignment in Clustal Omega (http://www.ebi.ac.uk/Tools/msa/). The signal peptide sequences are labeled in gray, the mature protein sequences in black, and the amino acid differences are in bold. The positions of the amino acid substitutions in the mature protein are shown above the sequence. Asterisk indicates the same amino acids at the given position. Colon indicates conservation between groups of strongly similar properties-scoring >0.5 in the Gonnet PAM 250 matrix and both amino acids are similar to each other with respect to biological function. Dot indicates conservation between groups of weakly similar properties-scoring ≤0.5 in the Gonnet PAM 250 matrix. not expect that the CSN1S1 * J variant has significantly different biochemical properties compared to the CSN1S1 * A variant; however, this expectation needs to be further investigated. Another new variant, CSN1S1 * K, detected in Alpine ibex has the amino acid substitution Iso44Val (compared to CSN1S1 * A). Both amino acids have large rigid aliphatic hydrophobic chains, and the biochemical properties of isoleucine and valine are similar. As such, major biochemical differences between the CSN1S1 * K and CSN1S1 * A variants are not expected. Interestingly, the protein variant CSN1S1 * B3 detected at high frequency in all Sudanese breeds and Saanen goats has been associated before with increased milk protein yield and high amounts of alpha S1 casein in milk. This, in turn, could alleviate the gross yield and quality of cheese production (Ambrosoli et al., 1988;Pirisi et al., 1994;Clark and Sherbon, 2000;Devold et al., 2011;Cebo et al., 2012).
For beta casein, two novel non-synonymous SNPs (ss7213522526 and ss7213522558) occurring in Nubian ibex and Alpine ibex were found in the signal peptide sequence.
The mature protein is not affected by these SNPs. Because the encoded mature protein variant is not changed, no new variant name was assigned. Not assigning names to amino acid variants in the signal peptide, in the opinion of the authors, could lead to underestimating the role of the signal peptide. For example, the signal peptide changes might cause the protein to be mistargeted, leading to the protein not being excreted in the milk.
The beta casein protein variant CSN2 * A, which is believed to be the ancestral allele of CSN2 , was found in all examined domesticated goats breeds and Bezoar ibex with a frequency equal or above 0.5. The high frequency of CSN2 * A in domesticated goats has been described before in Saanen and Alpine goat breeds from France (Boulanger et al., 1984) and Italy (Marletta et al., 2005), as well as in goat breeds from India (Rout et al., 2010) and West Africa (Caroli et al., 2007). The CSN2 * A variant has been associated with high beta casein content in milk (about 5 g/L per allele) in comparison to CSN2 null alleles (Roberts et al., 1992;Mahé and Grosclaude, 1993;Persuy et al., 1999;Neveu et al., 2002;Galliano et al., 2004;Cosenza et al., 2005; Capra hircus autosome (CHR) position relative to reference sequences (accession no. NC_030813): CSN1S1*A, CSN2*C, CSN1S2*A, and CSN3*B. Caroli et al., 2006). Therefore, we hypothesize that the high frequency of the CSN2 * A variant in domesticated breeds could perhaps be the result from the selection of animals for milk with high protein and fat contents and good cheese-making properties (Tortorici et al., 2014;Vacca et al., 2014). The CSN2 * A protein variant was not found in the wild Nubian and Alpine ibex. The absence of this variant in Nubian and Alpine ibex might be due to the low sample size in this study, but it could also be that the assumed ancestral allele is not the ancestral allele. Another hypothesis would be that the CSN2 * A variant has a fitness effect on large mountain goats. This, however, needs to be further investigated using a larger sample size of the Nubian and Alpine ibex, as well as looking into other mountain goat species. Another highly frequent protein variant is CSN2 * C, which was also found in all examined goat breeds, except in Alpine ibex. The high frequency of this protein variant was also evident in Northern and Southern Italian goat breeds  and in Banat's White Romanian goats (Kusza et al., 2016). The new beta casein protein variant CSN2 * F was detected in Alpine ibex only. Simulation shows that the substitution of proline to leucine at position 148 could lead to an enhanced cleavage of the protein by chymotrypsin.
For alpha S2 casein, besides the CSN1S2 * A protein variant, four new variants were detected in this study. For these new variants, preliminary names were suggested (CSN1S2 * H, CSN1S2 * I, CSN1S2 * J, and CSN1S2 * K). So far, 10 variants have been identified for the goat alpha S2 casein (see Table 1). However, only seven variants have been well characterized at the protein and DNA levels. Surprisingly, CSN1S2 * A was found in this study in Desert goats only. Since many other studies (Boulanger et al., 1984;Erhardt et al., 2002;Chiatti et al., 2005;Caroli et al., 2007) found this variant at high frequency, we had expected to find CSN1S2 * A in all goat breeds in our study.
With respect to kappa casein, CSN3 * B is not only the reference but also the most commonly found kappa casein variant in our study. This agrees with previous research (Kiplagat et al., 2010;Kupper et al., 2010;Strzelec and Niżnikowski, 2011). The CSN3 * K variant was detected in Taggar as well as in Saanen goats in our study, albeit it has not yet been reported before for Saanen goats from Europe (Kupper et al., 2010). The CSN3 * N variant, which was detected in Nubian and Desert goats, has been reported before at low frequency in the Small East Africa goat from Kenya and Long Eared Somali goats from Ethiopia and Somalia (Kiplagat et al., 2010). The new variant CSN3 * X that was detected in Alpine ibex was similar to the protein variant CSN3 * B, except for positions 33 and 37, where CSN3 * X carried the amino acids asparagine and threonine, respectively. Concerning the amino acid substitutions Ser33Asp and Ser37Thr, asparagine has similar biochemical properties to serine, while threonine at position 37 might enhance cleavage of the protein by proteinase K. The isoelectric focusing (IEF) pattern of the new variant CSN3 * X was not experimentally tested, but it was predicted using ExPASy. The predicted IEF was 5.53. If true, CSN3 * X belongs to the B IEF group, while CSN3 * B was classified in the A IEF group. Since the B IEF group is favorable for improving milk protein content and cheese-making properties, the new variant is an interesting target for milk and cheese production.
Most of the novel protein variants detected in our study were found in Nubian and Alpine ibex. This underlines the necessity to pay more attention to the study and conservation of endangered species in order to protect valuable genetic resources.

CONCLUSION
In this study, novel genetic variations of goat casein genes were discovered by capture sequencing. Most of the genetic variations, especially the non-synonymous polymorphisms, were identified in the critically endangered Nubian ibex and Alpine ibex. Therefore, we would like to emphasize and highlight the importance of preservation and studying rare and endangered species. It is noteworthy that nine new protein variants were found for the first time in the DNA sequences of the casein genes. Three protein variants in the CSN1S1 gene were identified in Saanen goats, Bezoar ibex, and Alpine ibex. In the CSN2 and CSN3 genes, one additional protein variant was detected in Alpine ibex in each gene. Four new protein variants that were found in the CSN1S2 gene occurred in all studied goat breeds and species. The identified novel protein variants are of interest not only for their effect on protein and milk composition but also for evolutionary studies on milk protein genes. Unfortunately, neither RNA nor milk samples of the studied goat breeds were available. Therefore, further investigation is necessary to examine the expression of the nine new variants on the protein level to validate and confirm the outcomes of this study.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. SNP genotypes are available from the European Variant Archive (EVA) under project ID: PRJEB42077, and can be found at https://www.ebi.ac.uk/ena/data/view/PRJEB42077. The DNA sequencing dataset is available at the NCBI Short Read Archive under BioProject ID: PRJNA683771. The dataset in this study can be found at http://www.ncbi.nlm.nih.gov/bioproject/683771. Further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
All samples were collected with permission from the owners of the animals and according to the animal protection law in Sudan.
Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
SR, DA, and GB conceived and designed the study. SR, MR, and LH provided the samples. SR, MR, and SK performed the experiments. SR and DA analyzed the data. SR interpreted the data and drafted the manuscript. DA and GB helped to draft the manuscript. AS, LH, MR, and SK did critical revision of the manuscript. All authors read and approved the final manuscript.

FUNDING
This study was supported by a Georg Forster Research Fellowship of Alexander von Humboldt Foundation, Germany, to SR. We acknowledge support by the German Research Foundation (DFG) and the Open Access Publication Fund of Humboldt-Universität zu Berlin.

ACKNOWLEDGMENTS
The authors thank the goat owners in Sudan, management staff of the Goat Research Stations Wad Medani, Kuku, and Dongola, and farms of Bahri University and Sudan University for providing goat samples.