Trypanosomosis: potential driver of selection in African cattle

Trypanosomosis is a serious cause of reduction in productivity of cattle in tsetse-fly infested areas. Baoule and other local Taurine cattle breeds in Burkina Faso are trypanotolerant. Zebuine cattle, which are also kept there are susceptible to trypanosomosis but bigger in body size. Farmers have continuously been intercrossing Baoule and Zebu animals to increase production and disease tolerance. The aim of this study was to compare levels of zebuine and taurine admixture in genomic regions potentially involved in trypanotolerance with background admixture of composites to identify differences in allelic frequencies of tolerant and non-tolerant animals. The study was conducted on 214 animals (90 Baoule, 90 Zebu, and 34 composites), genotyped with 25 microsatellites across the genome and with 155 SNPs in 23 candidate regions. Degrees of admixture of composites were analyzed for microsatellite and SNP data separately. Average Baoule admixture based on microsatellites across the genomes of the Baoule- Zebu composites was 0.31, which was smaller than the average Baoule admixture in the trypanosomosis candidate regions of 0.37 (P = 0.15). Fixation index FST measured in the overall genome based on microsatellites or with SNPs from candidate regions indicates strong differentiation between breeds. Nine out of 23 regions had FST ≥ 0.20 calculated from haplotypes or individual SNPs. The levels of admixture were significantly different from background admixture, as revealed by microsatellite data, for six out of the nine regions. Five out of the six regions showed an excess of Baoule ancestry. Information about best levels of breed composition would be useful for future breeding ctivities, aiming at trypanotolerant animals with higher productive capacity.


Introduction
African animal trypanosomosis (AAT) is a severe disease caused by three species of Trypanosoma parasites: T. congolence, T. vivax, and T. brucei. Trypanosomosis is responsible for the deaths of millions of livestock each year and a reduction in the productivity of many more. With no vaccine available, and with heavy expenditure on trypanocidal and vector control, trypanosomosis is estimated to cost over 4 billion US dollars each year in direct costs and lost production (Hanotte et al., 2003).
West African cattle have the ability to control parasitemia and anemia related to trypanosomosis and a greater ability to grow and produce in tsetse infested areas (Murray et al., 1984). This is thought to result from an adaptation process of indigenous cattle breeds. Trypanotolerant breeds represent a small proportion (6%) of the cattle population of Africa and 17% of the cattle in the tsetse challenged areas (Agyemang, 2005). The option of using those breeds in breeding systems thus reduces or eliminates the use of chemicals to control the trypanosomosis vector and other parasites and contributes positively to a balanced ecosystem health. The complex trypanotolerant trait that is present in indigenous West African taurine cattle is not present in the introgressed Bos indicus cattle and is dependent on admixture proportion in hybrid breeds (Freeman et al., 2004).
Humpless taurine populations (Bos taurus) are original indigenous cattle of Africa which entered into the African continent before Zebu cattle, around 8000 years ago for longhorn and around 4750-4500 years ago for shorthorn, while the humped zebu populations (Bos indicus) were brought only later into the African continent, via the Horn of Africa (Loftus et al., 1994;Bradley et al., 1996;MacHugh et al., 1997;Hanotte et al., 2002;Epstein, 1971). It is believed that taurine cattle penetrated West African forests about 4000 years ago (MacDonald and MacDonald, 2002;Freeman et al., 2004) while zebu populations arrived 1300-1000 years ago (Epstein, 1971;MacHugh et al., 1997;Mac-Donald and MacDonald, 2002).
There is evidence indicating that host genetic factors play a significant role in determining an individual's susceptibility/resistance status to trypanosoma infection (Murray et al., 1984;Hanotte et al., 2003;Courtin et al., 2006Courtin et al., , 2007Courtin et al., , 2008. Trypanotolerance of cattle can be associated with genomic regioms known as trypanotolerance candidate regions. Observations from Naessens et al. (2002) confirm that this trypanosoma tolerance encompasses at least two mechanisms: one that improves the control of parasitemia and another that limits anemia. The physiological and genetic mechanisms underlying trypanotolerance are being extensively investigated. Hanotte et al. (2003) performed experimental crossing of trypanotolerant N'Dama (Bos taurus) and trypanosusceptible improved Kenya Boran (Bos indicus) cattle, and mapped QTLs associated to trypanotolerance on 18 autosomes. Results suggest that selection for trypanotolerance within F2 cross between N'Dama and Boran cattle could produce a synthetic breed with higher trypanotolerance levels than currently exist in the parental breeds. Noyes et al. (2010) performed a genetic expression analysis to identify candidate genes in pathways responding to T. congolense infection.
Evidence for selective sweeps was observed at TICAM1 and ARHGAP15 loci in African taurine cattle, leading the authors to propose these genes as strong candidates to explain the QTL. Candidate QTL genes were identified in other QTL by their expression profile and the pathways in which they participate. Dayo et al. (2009) tested heterozygosity and variances in microsatellite allelic size among trypanotolerant and trypanosusceptible breeds which led to two significantly less variable microsatellite markers. One of these two outlier loci is located within the confidence interval of a previously described QTL underlying a trypanotolerance-related trait (Hanotte et al., 2003). Stella et al. (2010) analyzed selection signatures by contrasting 32,689 SNP genotypes of trypanotolerant African taurine N'Dama and Sheko cattle with those of all other breeds included in the Bovine HapMap study (Bovine HapMap Consortium, 2009). The overlap of candidate regions found in different studies is comparatively small. West African cattle have the ability to control parasitemia and anemia related to trypanosomosis and a greater ability to grow and produce in tsetse infested areas (Murray et al., 1984). This is thought to result from an adaptation process of indigenous cattle breeds.
Among the indigenous cattle in Burkina Faso, the Baoule, a taurine breed native to the tsetse-challenged southern part of the country, is known for its ability to cope with trypanosome infections. Pure Zebu (Bos indicus) is much more susceptible to the disease, but still preferred by farmers because of body size and suitability as draft animal. With the intention of having both big and tolerant animals, many farmers use composites, continuously mating Zebu, Baoule and their crosses. The preference for larger animals means that Zebu ancestry is predominant among the admixed animals. In genomic regions responsible for trypanotolerance however, higher levels of Baoule ancestry are expected. In a paper on approaches to detect signatures of selection from genome wide scans, Oleksyk et al. (2010) describe a way of detecting significant differences of local admixture levels in crossbred/admixed individuals compared to the average admixture across their genomes. This method can be applied to identify genome signatures of historic selective pressures on genes and gene regions.
Aim of this study was to compare levels of zebuine and taurine admixture in candidate regions for trypanotolerance with the "background" admixture levels, to identify differences in allelic frequencies of trypanotolerant and non-tolerant breeds, and to assess individual differences in admixture for particular animals. Regions potentially responsible for trypanotolerance were identified based on composite log-likelihoods of the differences in allelic frequencies of trypanotolerant and non-tolerant breeds, using Bovine HapMap data (Bovine HapMap Consortium, 2009). Individual admixture levels in these regions versus admixture levels of the background genome of composite Baoule x Zebu animals were compared.

Study Design and Animals
Blood was taken from 214 animals in total out of which 90 were Baoule from South West (SW) Burkina Faso, 90 were Zebu from the North (n = 54) and SW (n = 36) regions and 34 were Baoule-Zebu composites from SW. The North of Burkina Faso is part of the Sahelian region with no threat of trypanosomosis, while SW is a Sudanese region that is heavily tse-tse infested.
Designation of animals to breed was based on information by owners of the animals. Animals were from 23 different locations in Burkina Faso. FTA cards were used for collection and storage of blood for all animals.

Discovery of Regions for Selective SNP Genotyping
Only a small number of SNPs could be selectively genotyped in this project. For the choice of these SNPs, the selection signature approach and sampling of animals by Stella et al. (2010) was employed. Data were from the International Bovine HapMap study (Bovine HapMap Consortium, 2009), including the trypanotolerant African taurine breeds N'Dama and Sheko. Baoule is very closely related to N'Dama (Decker et al., 2014). The 32,689 HapMap SNPs as well as 54,001 Illumina 50k bovine Bead-Chip SNPs were available for analysis, extending the study of Stella et al. (2010). These two sources of data were merged and after quality control, applying a minor allele frequency threshold of 0.05, a minimum call rate of 0.95 per SNP and removal of duplicate SNP, the final data set comprised 71,235 SNP.
To identify putative selection signatures, allelic frequencies of the N'Dama (N = 22) and Sheko (N = 19), either pooled or separate, were compared to the allelic frequencies of the entire population (N = 497) in the study and nominal P-values were calculated for the differences in frequencies at each SNP. The nomimal P-values were then used to calculate composite log-likelihoods (CLL) for sliding windows of 9 SNP across the genome. To determine statistical significance, permutation testing was employed by comparing the CLL to the distribution of 50,000 permutations of CLL obtained with random samples of animals (i.e., across all HapMap breeds).
The signals typically pointed to narrow regions (0.2-0.4 Mb), with an average of 254,841 bp. A total of 158 SNPs from 23 regions with strong signals (genome-wide P < 0.01 in each breed) were chosen. Within each region, 4-10 roughly equally spaced SNPs were selected from the Illumina data base for genotyping. The rationale for this approach was that signatures of selection are likely linked to trypanosome tolerance in the African taurine breeds. Also, the signals were narrow compared to the results of QTL analyses available at the time, allowing targeted SNP selection. Furthermore, the signatures targeted were observed in both the N'Dama and Sheko, perhaps suggesting that they arose in a past ancestral population of all trypanotolerant breeds and were likely to be present in the Baoule as well.

Choice of Microsatellites
To reflect the admixture levels in the background genome of the animals in this study, a total of 25 autosomal microsatellites were chosen, giving a preference to FAO recommended markers (FAO, 2011), without considering information about trypanosome candidate regions. For the autosomal chromosomes, a total of 31 microsatellite primers have been chosen for the amplification of the genomic DNA. 15 primers were donated by the International Livestock Research Institute, Nairobi, Kenya. PCR conditions were optimized and all the 31 microsatellites tested for polymorphism. A final panel of 25 microsatellites was selected for genotyping of the cattle populations. 22 microsatellites out of them (BM1818, BM1824, BM2113, CSSM066, ETH3, ETH10,  ETH185, ETH225, HAUT24, HAUT27, HEL1, HEL5, HEL9,   HEL13, ILSTSS005, ILSTS006, INRA023, INRA032, TGLA53,  TGLA122, TGLA126, and TGLA227) were from a list recommended by the Food and Agriculture Organisation (FAO) and the International Society for Animal Genetics (ISAG) for use in cattle diversity studies. The others, namely AGLA293, ILSST033, and MGTG4B, were out of both the FAO and the ISAG list. The microsatellites were selected combining information from both National Centre for Biotechnology Information (NCBI, http:// www.ncbi.nlm.nih.gov/) database. The selected microsatellites covered 22 autosomal chromosomes.

Genotyping of Animals
Genomic DNA was isolated from white blood cells according to a modified protocol of Whatman (Whatman FTA Protocol BD09). Genotyping of the 25 microsatellites was performed on a MegaBACETM 500 genotyping device. The PCR reaction mixture with the final volume of 22 4l included 10 ng template genomic DNA was used in autosomes amplification. 8.05 4l of double distilled water, 3.20 4l of 10 × Buffer B (Mg2+ free containing 0.8 M Tris-HCl, 0.2 M (NH4)2SO4, 0.2% w/v Tween-20), 2 4l of 2 mM dNTP-Mix, 1.60 4l of 25 mM MgCl2, 0.5 4l of each forward and reverse primers and 0.15 4l of 5 U/4l FIREoL R DNA polymerase. One primer in each pair was labeled FAM or TET. The 155 selected SNPs were were multiplexed and genotyped on the Sequenom MassARRAY system. The choice of SNPs within the regions of interest was guided by a bioinformatic protocol optimizing the multiplexing strategy. We tried to space the SNPs equally across the 200-400 Kb regions of interest. The total number of SNPs targeted for genotyping was 150 with a minimum of 5 per region. The mastermix for multiplying SNPs comprised 0.50 4l of each forward and reverse primers and 0.20 4l of 5 U/4l HotStar Taq DNA polymerase plus 3 4l (3 4g) of Salmon sperm to the all mastermix. Total volume of tubes was 4 4l. A digestion was made after the PCR with shrimp alkaline phosphate (SAP). The SAP cleaves a phosphate from the unincorporated dNTPs, converting them to dNDPs and rendering them unavailable to future reaction. The SAP mix has been made from 1.5 4l of water (HPLC grade), 0.17 4l of 10 × SAP buffer, 0.30 4l of 1.7 U/4l SAP enzyme. 2 4l of SAP mix was added to the normal PCR product for digestion. The digestion was followed by ani PLEX PCR. The iPLEX mix was made of 0.619 4l of water, 0.20 4l of 10 × iPLEX buffer, 0.2 4l od iPLEX Termination mix, 0.041 4l of iPLEX Enzyme and 0.940 4l of the extent primer. 2 4l of the iPLEX mix was added to digested product. The following cycling program was run for amplification: 5 min initial denaturation at 95 • C followed by 35 cycles of denaturation at 95 • C for 1 min, annealing at 55 • C for 1:30 min, extension at 65 • C for 3 min and final extension step of 65 • C for 5 min using Applied Biosystems 96-Well GeneAmp R PCR System 9700 thermal cycler.
The normal PCR of SNP study was run according the following protocol: 2 min initial denaturation at 95 • C followed by 45 cycles of denaturation at 95 • C for 0:30 min, annealing at 56 • C for 0:30 min, extension at 72 • C for 1 min and a final extension step of 72 • C for 5 min and 4 • C for 5 min. The digestion was run on 45 min: 40 min at 37 • C and 5 min at 85 • C. While the iPLEX PCR run as followed: 030 min of initial denaturation at 94 • C, denaturation again at 94 • C for 0:05 min, annealing at 52 • C for 0:05 min, extension at 80 • C for 0:005 min. Annealing up to extension (80 • C) was repeated 5 times, from the second denaturation to the extension 40 repeats as well. A final extension step of 72 • C was run for 3 min ended by 4 • C forever. The normal PCR, the SAP digestion and the iPLEX PCR were performed on using Applied Biosystems 384-Well GeneAmp R PCR System 9700 thermal cycler.
SNPs were positioned according to Btau 4.0. Monomorphic SNPs and SNPs with more than 10% of missing values were excluded, data analysis was performed with the remaining 135 SNPs. Average linkage disequilibrium levels, calculated as R-squared values of SNPs within candidate regions were 0.091, with 5% and 95% quantiles of 0.00003 and 0.478.

Data Analysis
Processing of raw data to formats usable in PLINK was done with SAS (SAS Institute Inc, 2009). Ancestry inferences were performed using STRUCTURE (Pritchard et al., 2000(Pritchard et al., , 2010Hubisz et al., 2009). STRUCTURE uses a model-based clustering algorithm to infer population structure using genotype data. The software clusters data according to allele frequencies into K populations. As there was linkage disequilibrium in our SNP data, we used version 2.3.4. We employed the admixture model using a burn-in period of 10,000 repeats followed by 10,000 Markov Chain Monte Carlo (MCMC) repeats and considering SNP frequencies correlated. Convergence of the MCMC was investigated with a several STRUCTURE runs on the same datasets. STRUCTURE analyses were all supervised, with added information on pure breed or a cross identity. The assumption of a two breed cross was confirmed with Admixture software (Alexander et al., 2009) with K from 2 to 7, the lowest cross validation error was at K = 2; cv = 0.48. PLINK (Purcell et al., 2007) was used to recode alleles for analysis. AlphaPhase (Hickey et al., 2011) was used for haplotype imputation of SNPs from candidate regions. To evaluate population differentiation, proc ALLELE of SAS/GENETICS 9.2 was used to calculate the fixation index (F ST ) for every microsatellite, SNP and haplotype derived from candidate regions. This calculation was based on variance in allele frequencies (Weir and Cockerham, 1984;Weir and Hill, 2002).

Results
Crosses/composites of trypanotolerant and trypanosusceptible cattle were the focus of this analysis. STRUCTURE results indicate that the information acquired from farmers about pure Baoule and Zebu breed types is reasonably accurate with 0.87/0.89 and 0.07/0.06 Baoule ancestry proportions for these Frontiers in Genetics | www.frontiersin.org two breeds based on Microsatellite/SNP markers (see Table 1). Average Baoule admixture in background genomes (as assessed by microsatellite markers) of Baoule-Zebu composites was 0.31, which was somewhat, but not significantly (p = 0.15 based on a t-test), smaller than the average Baoule admixture in the AAT candidate regions (assessed by SNP markers), 0.37. Admixture proportions were also determined for each genomic region potentially implicated in AAT tolerance ( Table 1) Table 2. Haplotypes with more than 10% frequency in at least one of the breeds are given in Table 3 for the nine candidate regions with F ST ≥ 0.20. Reconstructed haplotypes showed differentiation of the Zebu and Baoule individuals. For the region on CHR 21 (20.40-20.60 Mb), the total frequency of two haplotypes (out of 22) was 83.34% for Baoule, in composites the frequency of these haplotypes was 51.52% (they had 13 haplotypes in total) whereas their frequency (16 haplotypes) in Zebu animals was 22.03% ( Table 2) Genes found in candidate genomic regions studied as recovered from Ensembl (www.ensembl.org) are provided in Supplementary Table 1. Their potential relevance to trypanotolerance based on information from other studies is discussed below.

Discussion
Baoule admixture across the genome, based on a sample of 25 microsatellite markers, of the Baoule-Zebu composites was 0.31, compared to the average Baoule admixture based on SNPs in AAT candidate regions (0.37), see Table 1. This difference was not significant, though (P = 0.15). Admixture was measured with two distinct types of markers, justification for the process can be found in a study of Schopen et al. (2008), showing that the information content of one microsatellite corresponds to an equivalent to that of about three SNPs in cattle. Similar results were provided by Gärke et al. (2012) when analyzing population differentiation of chicken breeds. The average number of haplotypes per region was 12.59 for Baoule, 11.91 for Zebu and 10.64 for composites. We found slightly more haplotypes in our candidate regions for Baoule (277) compared to Zebu (262), the lower number of haplotypes in composites is most likely due to the smaller sample size. The results are in contrast to Murray et al. (1984) who found higher diversity in B. indicus compared to B. taurus. It is known that the West African B. taurus populations contain a degree of B. indicus admixture (Alvarez et al., 2014), but the proportion is small in Baoule (Hanotte et al., 2003;Soudré et al., 2013). Due to the very low F ST in some candidate regions, STRUC-TURE was not able to separate pure breeds in those regions. Overall F ST calculated with SNPs from candidate regions (F ST = 0.14) matches that from other studies, see Dayo et al. (2009), while F ST calculated with microsatellites (F ST = 0.09) was lower. When looking at F ST for single SNPs, the highest value was 0.70  (Barreiro et al., 2008). Therefore we concentrated on candidate regions which showed F ST > 0.20. Using information from sex linked markers Soudre (2011) found a relative age of admixture of 69 ± 43 years from 2007 data for the crosses analyzed in this study. This is consistent with the findings of Grace (2006) (Gautier et al., 2007(Gautier et al., , 2009

Conclusions
In this study admixture in genomic regions potentially related to trypanotolerance was compared with admixture in the background genome. A non-significant trend of higher proportions of Baoule admixture in the candidate regions was found and a majority of regions (5 of 6) with admixture levels significantly different from background admixture indicated high levels of Baoule ancestry.
In this study, the discovery of trypanotolerance candidate regions was performed via a selection signature approach based on differences of allelic frequencies of trypanotolerant African taurine breeds versus other breeds around the world, using data from the Bovine HapMap consortium. Targeted SNP genotyping in candidate regions was the method of choice. Given the reduction of cost for high density SNP chip genotyping, part of the samples used in this study are now being genotyped with the commerical chip of Illumina Inc.,covering almost 800,000 SNPs. A large number of markers will also allow estimation of individual age of admixture and therefore the number of generations of natural selection acting on the composites. Targeted resequencing approaches of interesting candidate regions that can identify both common and exceedingly rare causal variants could potentially give more insight into trypanotolerance mechanisms. Silbermayr et al. (2013) developed a novel qPCR assay for indication of infection status of animals with the three trypanosome species involved in AAT (T. vivax, T. congolense, and T. brucei) from blood samples of most of the animals involved in this study. Zebus were twice as often infected (21.74%) compared to Baoule (9.70%) and composites (9.57%). Phenotypic measures oftrypanosomosis by routine checking of infection status will help to identify best composites . Information about best levels of admixture in composites is a premise of more effective and sustainable use of trypanotolerant types of cattle.

Author Contributions
JS conceived the study, with the support of OH. ASo collected samples and background information, as suggested by MW, GB, and MM provided genotyping facilities and support. The study performed to find trypanosoma tolerance region was performed by PB and ASt. Genotyping of bovine microsatellites was performed by ASo and SM, SNPs were genotyped by JB and ASo while KS genotyped parasites. Admixture and F ST analysis was performed by ASm who also drafted the manuscript. Results were interpreted by all authors, PB, JS, GM, and ASm provided the biggest contributions in manuscript revision. All authors read and approved the final manuscript. ASm -Anamarija Smetko; ASo -Albert Soudre; ASt -Alessandra Stella. The views expressed in this publication are those of PB and do not necessarily reflect the views or policies of FAO.