Taurine and Indicine Haplotype Representation in Advanced Generation Individuals From Three American Breeds

Development of the American Breeds of beef cattle began in the 1920s as breeders and U. S. Experiment Station researchers began to create Bos taurus taurus × Bos taurus indicus hybrids using Brahman as the B. t. indicus source. By 1954, U.S. Breed Associations had been formed for Brangus (5/8 Angus × 3/8 Brahman), Beefmaster (½ Brahman × ¼ Shorthorn × ¼ Hereford), and Santa Gertrudis (5/8 Shorthorn × 3/8 Brahman). While these breeds were developed using mating designs expected to create base generation animals with the required genome contributions from progenitor breeds, each association has now registered advanced generation animals in which selection or drift may have caused the realized genome compositions to differ from initial expected proportions. The availability of high-density SNP genotypes for 9,161 Brangus, 3,762 Beefmaster, and 1,942 Santa Gertrudis animals allowed us to compare the realized genomic architectures of breed members to the base generation expectations. We used RFMix to estimate local ancestry and identify genomic regions in which the proportion of Brahman ancestry differed significantly from a priori expectations. For all three breeds, lower than expected levels of Brahman composition were found genome-wide, particularly in early-generation animals where we demonstrate that selection on beef production traits was likely responsible for the taurine enrichment. Using a proxy for generation number, we also contrasted the genomes of early- and advanced-generation animals and found that the indicine composition of the genome has increased with generation number likely due to selection on adaptive traits. Many of the most-highly differentiated genomic regions were breed specific, suggesting that differences in breeding objectives and selection intensities exist between the breeds. Global ancestry estimation is commonly performed in admixed animals to control for stratification in association studies. However, local ancestry estimation provides the opportunity to investigate the evolution of specific chromosomal segments and estimate haplotype effects on trait variation in admixed individuals. Investigating the genomic architecture of the American Breeds not only allows the estimation of indicine and taurine genome proportions genome-wide, but also the locations within the genome where either taurine or indicine alleles confer a selective advantage.


INTRODUCTION
Indicine cattle were first imported into the United States from India in 1906 and then from Brazil in the 1920's and were used via crossbreeding with taurine cattle and backcrossing to develop the Bos taurus indicus Brahman (Sanders, 1980) which has very little residual Bos taurus taurus within its genome (Chan et al., 2010). The American Breeds of beef cattle are populations that were developed in the United States beginning shortly after the introduction of the B. t. indicus cattle to capitalize on breed complementarity and heterosis for production and adaptation to heat stress and the nutritional limitations, parasites, and diseasecausing pathogens prevalent in the southern tier of the country (Cartwright, 1970;Dickerson, 1970). Indicine × taurine crossbred individuals have been widely produced throughout subtropical and tropical regions of the world (Porto-Neto et al., 2014; Several approaches have been developed for the estimation of local ancestry (breed of origin of the two alleles present at specific loci) in admixed individuals, however, these applications have primarily been focused on recently admixed populations. Individuals from admixed populations have chromosomes that comprise mosaics of chromosomal segments originating from each of the ancestral populations (Thornton and Bermejo, 2014). On the other hand, global ancestry estimates predict the relative proportions of the ancestral genomes present in an admixed individual, which is an average of the local ancestry estimates, and ignores information pertaining to the variability among locus-specific ancestries (Tang et al., 2005). Drift and strong selection can lead to regions of the genome with ancestries that differ significantly from breed expectation and examination of these regions may identify candidate genes that are under selection and suggest the nature of the selected phenotype. We estimated local ancestry for registered Brangus, Santa Gertrudis, and Beefmaster animals that had been genotyped with the BovineSNP50, or derivative assays, and examined the average ancestries at specific chromosomal locations to identify regions of the genome that differ from expected global proportions both within and across breeds. Using the total number of haplotypes detected in each animal's genome as a proxy for its generation number, we also contrasted the genomes of early-and advancedgeneration animals to ascertain those genomic regions which had been exposed to recurrent selection within each of the breeds.
HD assay. Following imputation, each sample had genotypes for 836,118 variants.

Reference Panels
Local ancestry estimation requires a reference panel of genotypes for representatives of each ancestral population. We developed two reference panels for each American Breed. The first panel was identical for all three breeds and comprised animals registered by the American Angus Association, American Hereford Association, American Shorthorn Association, and American Brahman Breeders Association for which CRUMBLER  breed composition estimates were ≥85% to the respective breed ( Table 2). The second reference panel was created using only individuals from the respective ancestral breeds (ANCESTRAL reference) ( Table 2). Since some ancestry estimation software is influenced by unequal reference panel sample sizes (Maples et al., 2013;Crum et al., 2019), the breed possessing the smallest number of available individuals determined the approximate random sample size for the remaining breed(s). Reference panel individuals had been genotyped using one of 8 commercially available assays including the GeneSeek BOVG50v1, GGP-F250, GGP-HDV3, and GGP-LDV3, the BovineHD, BovineSNP50, i50K, and Irish Cattle Breeding Federation (Cork, Ireland) IDBv3, and were phased and imputed following the same procedures as for the American Breed individuals. Gobena et al. (2018) used ADMIXTURE to estimate ancestry in an Angus × Brahman population, however, we have previously found the software to be sensitive to the order of animals within the input files  and consequently we used RFMix v2.03, for local ancestry and admixture estimation (Maples et al., 2013). RFMix partitions chromosomes into non-overlapping windows and infers ancestry for each window. If contiguous windows are assigned the same ancestry, RFMix concatenates the windows resulting in a variable number of haplotypes of varying length predicted for each individual. We first inferred local ancestry with a window size of 100 SNPs (spanning an average of ∼300 kb) using two reference panels for comparison. We next examined a window size of 25 SNPs (spanning on average ∼75 kb) for the ANCESTRAL reference panel. RFMix allows the specification of the number of generations separating the query samples from the initial reference population admixture event. However, because pedigree and generation information were not provided by the American Breed Associations, we used a generation interval of 5 years and the dates of formation of each Breed Association to arrive at an estimate of a maximum of 16 generations, with an average of about 8 generations for these individuals. This reflects the fact that first generation American Breed animals are continuously being generated and registered by breeders using superior animals from the requisite foundation breeds to capture the benefits of ongoing selection within the numerically larger foundational breeds.

Local Ancestry Estimation
Many of the American Breed Associations allow the registration of the purebred, F 1 , and back-cross animals used to create first generation registered animals to enable a complete pedigree based on consistent registration numbers within their herdbook. To determine if some of these animals may have been genotyped and provided by the Breed Associations, we removed samples assigned by CRUMBLER to be ≥ 50% Brahman or ≥ 90% Angus or Shorthorn ancestry from the Brangus and Santa Gertrudis data, respectively (Brangus, n 297; Santa Gertrudis, n 4). For the Beefmaster, we removed samples with ≥ 90% assignment to Shorthorn, Hereford, or Brahman to remove potential purebred founders and any samples with ≤ 5% assignment to any one of these breeds to remove potential F 1 individuals (n 10).

Generation Proxy
Genotypes used in this study were provided by the American Breed Associations with each animal's identity anonymized and without  pedigree or generation information. Since selection operates each generation, the cumulative effects of selection on a composite genome should increase with generation number. To enable the stratification of the animals within each American Breed based on generation number, we utilized the RFMix output to compute the total number of taurine and indicine haplotypes present within the diploid autosomal genome (2N 58) of each animal and the average length of all haplotypes within the diploid genome. Under the assumption of selective neutrality, the number of haplotypes within the genome should increase each generation due to recombination and correspondingly, the average length of the haplotypes should decrease each generation. Purebred and F 1 animals have 58 haplotypes represented as full length chromosomes and assuming an average of one crossover per chromosome pair each meiosis, back-cross (Purebred × F 1 ) animals have, on average, 87 haplotypes which average ¾ of the chromosome length and F 2 (F 1 × F 1 ) animals have, on average, 116 haplotypes which average ½ of the chromosome length.

Genomic Divergence From Breed Expectation
Within each of the American Breeds, for the ith window within the genome we tested the null hypothesis that Ho: θ i θ against the alternate hypothesis Ha: θ i ≠ θ using the Z-statistic: Here, θ i is the breed Brahman proportion and p i is the sample Brahman proportion within the ith window. The parameter θ represents the American Breed's genome average expected Brahman proportion under selective neutrality in the absence of drift and was set to θ 0.375 for Brangus and Santa Gertrudis, and 0.5 for Beefmaster. Var(θ i ) is the variation in Brahman proportion across windows throughout the genome under selective neutrality and in the absence of drift. Var(θ i ) cannot be estimated from the sample unless the null hypothesis is true and so to obtain an estimate of this parameter for each breed composition, we conducted a simulation using 1,000 animals per generation with each animal genotyped at the 836,118 loci with ARS-UCD1.2 reference genome coordinates. The number of recombination events per chromosome was assumed to follow a Poisson distribution with mean (λ) determined by chromosome length × recombination rate (e.g., λ 158.532931 Mb × 0.01 recombination events per Mb 1.58532931 for chromosome 1). The location of recombination events within each chromosome was simulated by sampling from a uniform distribution. For the Brangus and Santa Gertrudis simulation, 1,000 first generation ⅜ × ⅝ genomes were simulated by first creating 1,000 F 1 × Purebred crosses (¾ taurine × ¼ indicine) which were then randomly mated to independently sampled F 1 individuals. The first-generation animals were then randomly mated to produce 1,000 s generation individuals and so on for 8 generations of random mating. Ten replicate simulations were performed and within each replicate, θ was estimated as the average Brahman proportion across all genotyped loci and Var θ i was estimated as the square root of the variance of the Brahman proportion across all loci in the 8th generation individuals. Estimates of θ θ ± SD θ ; 0.3722 ± 0.0013 and Var θ i Var θ i ± SD Var θ i ; 0.0106 ± 0.0002 were obtained by averaging estimates across replicates. The Beefmaster simulation was identical to that for the Brangus and Santa Gertrudis except the 1,000 first generation ½ × ¼ × ¼ genomes were simulated by randomly sampling and mating 1,000 F 1 genomes (½ taurine × ½ indicine). Estimates of θ and Var θ i were 0.4976 ± 0.0013 and 0.0111 ± 0.0002, respectively. These estimates of Var θ i were used in place of Var θ i in the calculation of the Z-statistics for each of the American Breeds. Controlling for the error rate in multiple testing of windows within each breed was achieved using adjusted p-values proposed by Benjamini and Hochberg. (1995) at a false discovery rate (FDR) ≤ 0.001. Significant regions were queried for QTL reported in the CattleQTLdb (https://www.animalgenome.org/ cgi-bin/QTLdb/BT/search; Hu et al., 2019).

Genomic Divergence Between Early-and Advanced-Generations
Within each of the American Breeds, we sampled ∼10% of individuals with the smallest and largest number of haplotypes with their genomes to characterize early-and advancedgeneration individuals, respectively. Sampling was such that all animals within a haplotype number class were included resulting in slightly unequal sample sizes. There were 918, 207, and 400 animals in the early-and 955, 213, and 423 Brangus, Santa Gertrudis, and Beefmaster animals in the advanced-generation groups, respectively.
For the ith window within the genome, we tested the null hypothesis that Ho: θ ei θ ai against the alternate hypothesis Ha: θ ei ≠ θ ai using the Z-statistic: Here, θ ei and θ ai are the ith window Brahman proportions within the early-and advanced-generation individuals within the breed, which are estimated by the sample proportions p ei and p ai , respectively. The statistic p i n e p ei + n a p ai ne+na is the pooled estimate of the Brahman proportion when the null hypothesis is true and n e and n a are the numbers of haplotypes within the early-and advanced-generation individuals (twice the sample sizes), respectively. Multiple testing error rate was again controlled using Benjamini and Hochberg. (1995) adjusted p-values at a FDR ≤ 0.001. Significant regions were queried for QTL reported in the CattleQTLdb. This test was also applied genome-wide to test for differences in the global Brahman genome composition of early-and advanced-generation individuals.

Reference Panels
RFMix generates three output files for each analysis: 1) the most likely reference population assignments for each haplotype found in a window, 2) the marginal probabilities of each reference population being the ancestral population of origin for the haplotypes found in the window, and 3) global diploid ancestry estimates (Maples et al., 2013). We utilized the most likely assignment files to define the breed of origin of haplotypes found within each window and then computed the relative frequencies of haplotype origin from each representative breed in each of the windows.
Following the strategy employed by Browning et al. (2016), we created two reference panels to evaluate the effects of panel size on the estimation of local ancestry. We first selected the Brahman, Angus, Hereford, and Shorthorn animals present in the CRUMBLER  reference panel which had been extensively developed and evaluated for use in global ancestry estimation for breeds commonly found in North America. Alternatively, for each of the American Breeds, an ANCESTRAL reference panel was created from all available registered animals. To avoid potential issues caused by unequal sample sizes, we randomly down-sampled the larger populations to the sample size for the breed with the fewest available samples ( Table 2). For each of the American Breeds, the ANCESTRAL reference panel contained at least twice the number of animals from the ancestral populations than the CRUMBLER reference panel and ancestry estimates produced using the two panels were similar (Supplementary Figures  Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 758394 9 S1-S3). However, the proportion of Hereford ancestry appeared to be underestimated and the proportion of Shorthorn ancestry overestimated when the CRUMBLER reference panel was used for local ancestry estimation in Beefmaster (Supplementary Figure S3). Consequently, we decided to utilize the ANCESTRAL reference panels for all further analyses.

Window Size
Since RFMix concatenates contiguous windows with the same ancestry to form extended haplotypes within individuals and considering the large number of SNPs used in this study, we were interested in whether a window size of 25 SNPs would impact the analysis in terms of resolution near recombination breakpoints or the ability to discriminate between haplotype breed of origin.

Generation Proxy
Using the RFMix output for all samples provided by the American Breed Associations (Table 2), we estimated the number of Angus, Shorthorn, Hereford, and Brahman haplotypes present within the genotyped individuals, and also the average length of these haplotypes by breed of origin. Haplotype metrics for all three of the American Breeds are presented in Supplementary Table S1 and suggest that the genomic architectures of the Brangus and Santa Gertrudis breeds are quite similar. Each breed possesses, on average, 50% more taurine than indicine haplotypes and the taurine haplotypes are almost twice the length of the indicine haplotypes.
The correlation between the number of Angus and Brahman haplotypes in each Brangus individual was 0.80, while the correlation between the number of Shorthorn and Brahman haplotypes in each Santa Gertrudis individual was 0.73. This follows our expectation that the total number of haplotypes from each of the foundation breeds increases in advanced generation animals due to recombination. However, the correlation between the number of Brahman and taurine (Shorthorn and Hereford) haplotypes in each Beefmaster individual was only 0.40. The correlation between the number of Brahman and Hereford haplotypes was 0.50 while the correlation between the number of Brahman and Shorthorn haplotypes was −0.11 suggesting that other forces such as selection may be influencing the evolution of the Beefmaster genome.

Breed Ancestry
The average Brahman genome content was 25.81 ± 8.01% (±standard deviation among sampled individuals) for Brangus and 27.60 ± 7.05% for Santa Gertrudis (Table 3), far less than the 37.5% based upon breed expectations. Likewise, the Beefmaster had considerably fewer haplotypes of Brahman origin of smaller than average length (Supplementary Table S1) resulting in an average Brahman genome content of 30.84 ± 7.48%, far less than the breed expectation of 50%. The average Shorthorn genome content in Beefmaster individuals was 22.93 ± 8.01% and the average Hereford genome content was 46.23 ± 6.00%. Table 3 shows that chromosome 5 consistently had the highest Brahman ancestry in all three breeds, but there was not a strong concordance between chromosomes with the highest or lowest Brahman ancestry across the breeds. Despite this, the correlations between estimated Brahman ancestry percentages across all 29 autosomes were 0.49 for Brangus and Santa Gertrudis, 0.43 between Brangus and Beefmaster, and 0.61 between Santa Gertrudis and Beefmaster which also share a Shorthorn ancestry. These correlations certainly indicate that, on a chromosomal basis, there is a tendency for the breeds to share elevated or reduced Brahman ancestry.  Table S1). For all three breeds, the proportion of Brahman content within the diploid genome increases with haplotype number, which we use as a proxy for the generation number of these individuals (see also Supplementary Figure S4). The results for individual chromosomes ( Figures 4A, 5A, 6A) are not quite as obvious, however, the evolution of chromosome 5 clearly differs from most of the other chromosomes in all three breeds (see also Table 3).

Genomic Divergence From Breed Expectation
Nominal significance thresholds to achieve a FDR <0.001 using the Benjamini and Hochberg (1995) procedure were −log 10 p 3.0527 for Brangus, 3.0506 for Santa Gertrudis, and 3.0193 for Beefmaster. Using these values, 88.6% of tests performed for Brangus, 85.2% for Santa Gertrudis, and 95.2% for Beefmaster were significant. Values of the −log 10 P i values for each American breed are plotted according to their chromosomal coordinates in Figure 7. These results reveal that most of the genomes of each of the American Breeds differ in Brahman composition from the expected breed proportion based upon pedigree when assuming selective neutrality and the absence of drift.

Regions With the Greatest Brahman Divergence From Breed Expectation
The 5 most differentiated windows for Brahman genome content within each of the American breeds are in Table 4. These windows vary in size due to variation in the distribution of SNP locations throughout the genome, but also because RFMix concatenates contiguous windows where all individuals have the same ancestral population origin of haplotypes. Except for the locus on chromosome 5, all regions were enriched for taurine alleles. Table 5 contains beef trait QTL from the CattleQTLdb within the 1 Mb region centered on the regions enriched for taurine alleles in Table 4, except for the regions on chromosomes 1 and 18 in Brangus which are separately discussed.
Nominal significance thresholds to achieve a FDR <0.001 using the Benjamini and Hochberg. (1995) procedure were −log 10 p 3.0317 for Brangus, 3.2030 for Santa Gertrudis, and 3.2399 for Beefmaster. Using these values, 91.23% of tests

Regions With the Greatest Brahman Divergence Between Early-and Advanced-Generations
The 5 most differentiated windows for Brahman genome content between early and advanced-generation animals within each of the American Breeds are in Supplementary Table S2. For all 15 regions, the Brahman proportion was greater in the advancedthan in the early-generation animals. Supplementary Table S3 contains beef trait QTL from the CattleQTLdb within the 1 Mb region centered on the differentiated region in Supplementary  Table S2.

Reference Panels
Reference panel sample sizes have been shown to have significant effects on the accuracy of RFMix estimates (Maples et al., 2013). Reference panel sizes for analyses of local ancestry in human have ranged from as few as 19 samples to more than 500 samples per population (Maples et al., 2013;Browning et al., 2016) and human effective population size has been estimated to be 3,100 for Europeans and 7,500 for Yorubans (Tenesa et al., 2007). In cattle, the effective population size of most taurine breeds has been estimated to be about 100 (Bovine HapMap Consortium et al., 2009) and, consequently, we would expect that smaller reference panel sizes would be sufficient to capture the haplotypic diversity present within cattle breeds than for human. However, random samples of 200 individuals from each ancestral breed may not be sufficient to capture a large proportion of the haplotypic diversity within these breeds and larger sample sizes would certainly capture more of the rare haplotypes potentially leading to a greater accuracy of local ancestry estimation in admixed individuals (Maples et al., 2013). This is particularly important considering that the individuals in the reference panels and the founders of the American Breeds are separated by up to 16 generations and selection and drift may have caused large differences in haplotype frequencies between members of the groups. This appears to have impacted the estimation of local ancestry for the Beefmaster animals where proportions of Shorthorn and Hereford within the genome varied significantly depending on whether the CRUMBLER or ANCESTRAL reference panels were used. This may have been because the representation of horned Hereford and Line 1 Hereford animals was limited by sample size in the CRUMBLER reference panel and animals from these Hereford populations would have been prevalent at the time that the Beef master was initially formed.

Generation Proxy, Breed, and Individual Ancestry
To examine genome evolution in these breeds, we utilized the total number of haplotypes present within the genomes of the animals as proxy for the generation number of the animal. Under selective neutrality we would expect the number of taurine and   indicine haplotypes present within individuals to increase with generation number due to recombination among homologous chromosomes at each meiosis. Indeed, we found quite strong correlations (70-80%) between the numbers of indicine and taurine haplotypes within Brangus and Santa Gertrudis, but less so (50%) in Beefmaster. Strong selection for alleles found predominantly in one of the breeds will reduce the strength of this correlation and reduce the accuracy of the proxy for the prediction of generation number. However, we did not have any animals with known generation numbers with which to directly validate the utility of the proxy. Moreover, the results for Beefmaster may have been affected by our use of the most likely breed of haplotype origin, an absolute assignment, rather than the marginal probabilities of breed assignment. If the marginal probabilities of haplotype assignment to Shorthorn and Hereford are similar, but tend to favor one breed throughout the genome, we could see large differences in predicted Hereford and Shorthorn proportions genome-wide when, in fact, this is an artifact, and the true differences genome-wide are small. This would also artificially impact the correlations between taurine and indicine haplotype numbers in Beefmaster. However, the results in Figures 4B, Figures 5B, Figures 6B are both encouraging and interesting. In all three figures, the animals with the fewest numbers of haplotypes within their genomes have by far the lowest Brahman genome content. We suspect that these animals are not early generation Brangus, Santa Gertrudis, or Beefmaster animals but are intermediary crosses (1/ 8 × 7/8 and ¼ × ¾) produced by cattle breeders generating first generation members of each of the breeds who decided to genotype these animals and provide them to their respective Breed Associations (see e.g., https://gobrangus.com/tperkinsbreeding-up/). Despite this, the two interesting features of these plots are that the proportions of Brahman within all three of these breeds are far less than would be expected based upon the mating schemes used to produce the animals, and the proportion of Brahman within the genomes of advanced generation animals is consistently increasing supporting our results from the direct comparison of early-and advancedgeneration animals. Our interpretation of these results are that breeders of American Breed cattle select very strongly for performance and type (coat color, polled, sheath score, performance) characteristics, particularly in early generation animals, resulting in a selective advantage for taurine alleles which creates widespread linkage drag throughout the genome dramatically reducing the Brahman content. However, in advanced-generation cattle, selection emphases change towards adaptive alleles (nutrient utilization, temperature tolerance, pathogen resistance) favoring indicine alleles. In the Beefmaster, the increase in Brahman genome content in advanced-generation animals appears to be primarily at the expense of the Shorthorn content (Supplementary Figure S5). Goszczynski et al. (2018) used STRUCTURE to estimate the Brahman proportion within the genomes of 100 Argentinean registered Brangus animals to be 34.7%, which is very close to the breed expectation of 0.375 based upon pedigree. We suspect that the difference between this result and our finding reflects a very different selection history within the U. S. and Argentinian registered Brangus populations. Paim et al. (2020a) used ADMIXTURE applied to genotypes for 59 registered U.S Brangus cattle and estimated the Brahman proportion of their genomes to be 29.6%, on average, slightly larger than found in this study, but below breed expectation. Their estimates of breed proportion by chromosomal location show considerable similarity to our results in Figures 1-3, and they also found that chromosomes 5 and 15 had the lowest and highest Angus contents, respectively (Table 3), despite a much smaller sample size (Paim et al., 2020a).

Genomic Divergence From Breed Expectation
To test genomic windows for divergence from breed expectations, we conducted a simulation to estimate the variance in the proportion of Brahman genome content that would be expected across loci in a large population of advanced generation individuals. The estimate of variance from the simulation was quite small Var θ i ≈ 0.01 and because the overall Brahman genome content of individuals from each of the breeds was considerably smaller than that expected under neutrality, we found that very large proportions (>85%) of the genomes of each of the breeds were diverged from breed expectations. However, these values are very similar to the finding of Decker et al. (2012), who estimated that 82.41% of the genome of registered American Angus cattle was exposed to strong selection. The values of θ and Var θ i used to calculate the Z-statistic are important for determining the number of performed tests that are deemed significant, however, the produced Z-statistics rank identically to the sample Brahman proportion estimates p i regardless of the values of θ and Var θ i that are used. Our simulations are probably not representative of the generational mix within the samples we received from each Breed Association and the sample size simulated may not be representative of the number of animals used to generate each generation and it is possible that Var θ i is considerably higher than the value we used. This would result in a smaller proportion of the genomes of each breed being found to have diverged from breed expectation. For example, if we had used Var θ i 0.02, 0.03, 0.04, or 0.05 (almost a 5-fold increase in standard deviation and 25 fold increase in variance) the statistical threshold for a FDR <0.001 would have been 3.16, 3.22, 3.32, and 3.45 in Brangus and we would have estimated that 69.5, 60.17, 47.3, or 35.6% of the Brangus genome was diverged from breed expectation, respectively. Whatever value of Var θ i is used, the most extreme test statistics are concordant with the most diverged genomic regions. Figure 7 reveals that the patterns of divergence are complex but quite similar among the breeds for several chromosomes. The entirety of chromosome 25 possesses the least divergence, while chromosomes 1, 5, 11, 12, and 18 possess regions of very high divergence for all three breeds -characteristic of loci that have been strongly selected. Entire chromosomes are highly diverged for their Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 758394 Brahman composition including 15 in Brangus,3 and 4 in Santa Gertrudis,and 14,15,and 20 in Beefmaster. Chromosomes 8,9,13,17,20,22,24, and 29 reveal loci that have been exposed to selection in only a subset of the breeds. Selection for favorable alleles at multiple polygenes of large effect on a chromosome will generate a complex signature of selection that is dependent on initial phase relationships among the selected alleles and the relative intensities of selection applied to each of the alleles. Consequently, it is not likely to be a fruitful exercise to attempt to identity the underlying selected loci and phenotypes except perhaps for the largest effect loci and for loci that are well known to have been exposed to selection in these breeds. Goszczynski et al. (2018) found a significant enrichment for Brahman alleles within the BoLA region of chromosome 23 (chr23:28,720,113-28,724,739), however, we found no evidence for this (Figure 7) and once again speculate that this reflects a different selection history for the U.S and Argentinian populations.

Coat Color and Polled Loci Breed Characteristics
Brangus cattle are required to be polled (absence of horns) and to have solid black or red coat colors. Santa Gertrudis cattle may be horned or polled, with a light or dark solid red coat color preferred; roan (red coat with white patches) or white outside the underline or other spotting in other parts of the body disqualifies an animal from registration. Beefmaster cattle may be horned or polled and while brownish-red is the most common color, the breed has no color standards. Brahman cattle are primarily horned and have complex coat colors that can range from solid gray to brindled. Angus are polled due to the autosomal dominant Celtic polled allele, a complex structural insertion, at Polled located near the centromere of chromosome 1 (Brenneman et al., 1996;Medugorac et al., 2012) and have either solid black (American Angus Association) or red (American Red Angus Association) coat colors due to variation within MC1R located at 14,704,918-14,707,018 on chromosome 18. American Shorthorn can be horned or polled and can have red, white, or roan coat color patterns caused by variation at KITLG (Seitz et al., 1999) located at 18,245,986-18,317,616 on chromosome 5. American Hereford cattle can be horned or polled and have dark red to red-yellow coat colors. They are piebald with white on the head, ventral areas, lower legs, and tail switch which is inherited as an autosomal dominant due to structural variation located proximal to KIT at 70,157,944-70,262,786 Mb on chromosome 6 ( Grosz and MacNeil, 1999;Whitacre, 2014). QTL associated with variation in white spotting in cattle have also been identified on chromosome 6 near KIT (Reinsch et al., 1999;Liu et al., 2009). Figure 7 confirms the selection of Angus alleles (Figure 1) at Polled and MC1R in Brangus (windows with strongest signal Chr1:376,458,855; p i 0.02, -log 10 p 240.08; Chr18: 13,773,896,455;p i 0.04, −log 10 p 212.80) and Shorthorn alleles (Figure 2) at MC1R in Santa Gertrudis (Chr18:17,876,944,337; p i 0.03, −log 10 p 232.08), respectively. These windows were among the 5 most differentiated windows for Brahman genome content in both breeds (Table 4). Paim et al. (2020b) found evidence of strong enrichment of Angus alleles at Polled in 59 Brangus animals, but not at MC1R. Ancestral breed frequency signatures on chromosome 5 are impacted by a very strongly selected locus enriched in frequency for Brahman alleles in all three of the American Breeds towards the center of the chromosome (Figures 1-3). Despite this, Figures 2, 3 show a reduced proportion of Shorthorn ancestry in Beefmaster, but an increased proportion of Shorthorn ancestry in Santa Gertrudis animals in the vicinity of KITLG on chromosome 5 presumably due to selection for the allele conferring red coat color and against that conferring white in Santa Gertrudis. Regions on chromosome 6 near KIT were strongly enriched for Angus alleles in Brangus (Chr6:70,550,635,750; p i 0.14, −log 10 p 107.30), Brahman alleles in Santa Gertrudis (Chr6:71,137,233,574; p i 0.51, −log 10 p 38.49), and Brahman alleles in Beefmaster (Chr6:68,154,192,608; p i 0.66, −log 10 p 47.03). This suggests that selection has occurred for solid coat color patterns in all three breeds for both aesthetic and economic reasons due to losses caused by ocular squamous cell carcinoma, which is most common in Hereford animals that lack pigment around their eyes (Heeney and Valli, 1985).

Large Effect Locus Enriched for Ancestral Brahman Alleles
As the global proportion of indicine ancestry increases in B. t. taurus × B. t. indicus hybrids, coat color lightens, males tend to have more pendulous sheaths, body weight and condition scores increase, and tick and worm burdens decrease (Porto-Neto et al., 2014). B. t. indicus cattle tend to have a lower performance than B. t. taurus cattle under favorable conditions, but significantly outperform B. t. taurus cattle in extreme climates and environments where parasites, heat, and low inputs play important roles in the production system (Frisch and Vercoe, 1977;Frisch and Vercoe, 1984;Menjo et al., 2009;Porto-Neto et al., 2014). In almost all cases, the most significantly diverged regions are enriched for taurine alleles (Figures 1-3 and Figure 7). However, Brahman alleles are strongly enriched in frequency in all three of the American breeds in the central region of chromosome 5 (Figures 1-3). The window with the greatest Brahman composition was Chr5: 48,183,275,554; p i 0.59, −log 10 p 96.10 in Brangus, Chr5:42,298,482,771;p i 0.79, −log 10 p >300 in Santa Gertrudis,and Chr5:48,090,159,989; p i 0.84, −log 10 p 212.56) in Beefmaster. The window detected in Santa Gertrudis was almost 6 Mb from those found in Brangus and Beefmaster, but Figure 2 shows strong enrichment for Brahman alleles across the majority of chromosome 5 in Santa Gertrudis suggesting that selection may be acting on several loci on this chromosome in this breed. Querying the concatenated region Chr5:42,298,275,554  cattle that have previously been reported in studies of indicine or indicine × taurine crossbred cattle include Birth Weight (Peters et al., 2012;Saatchi et al., 2014b), Male and Female Reproduction (Hawken et al., 2012;McDaneld et al., 2014;Buzanskas et al., 2017), Stature, Yearling Weight, Carcass Weight and Longissimus Dorsi Muscle Area (Imumorin et al., 2011;Pryce et al., 2011;Saatchi et al., 2014b), and intramuscular fatness measured as Marbling Score (Peters et al., 2012;Leal-Gutiérrez et al., 2019). Also within this region are QTL that have been found to be associated with Immune Function (Leach et al., 2010), Susceptibility to Bovine Respiratory disease (Neupane et al., 2018), and Tick Resistance (Gasparin et al., 2007). The genes closest to the windows with the greatest Brahman ancestry in Brangus and Beefmaster are HMGA2 (Chr5: 47,819,966,336) and MSRB3 ( Chr5: 48,330,[41][42][43][44][45][46][47][48]511,868). In human, HMGA2 has been associated with 62 traits (https://www.ebi.ac.uk/gwas/genes/ HMGA2; Accessed Sept. 18, 2021), including body height, birth weight, systolic blood pressure, brain cortical volume, intelligence, and insomnia. Brahman females are known to suppress the birth weight of their calves (Dillon et al., 2015) to increase calving ease and reduce calf mortality. MSRB3 has been associated with 33 traits in human (https://www.ebi.ac. uk/gwas/genes/MSRB3; Accessed Sept. 18, 2021) including brain cortical volume, snoring, lung function, height, and temperament. The window with the greatest Brahman ancestry in Santa Gertrudis overlaps CPNE8, associated with 14 human traits (https://www.ebi.ac.uk/gwas/genes/CPNE8; Accessed Sept. 18, 2021) including chronotype and circadian rhythm, IgG glycosylation, heart rate, and bipolar disorder. Consequently, this broad genomic region includes genes which may explain several of the physiological differences between B. t. taurus and B. t. indicus cattle, including birth weight, height, blood vessel morphology, hair morphology, disease resistance, and temperament.

Large Effect Loci Enriched for Ancestral Taurine Alleles
We queried CattleQTLdb for the remaining 12 loci in Table 4 to identify candidate QTL responsible for the enrichment of taurine alleles at these highly diverged loci and the results are in Table 5. We performed the query for 1 Mb regions centered on the window centers reported in Table 4 and restricted our attention to loci responsible for variation in traits known to be under artificial selection or potentially under natural selection in U. S. beef breeds and discovered in taurine or B. t. taurus × B. t. indicus hybrid populations. We also collapsed database entries from the same author with separate QTL identities into a single QTL when they co-localized within the genome. For example, the query for chromosome 14 region retrieved 5 database entries for Birth Weight from Snelling et al. (2010) with QTL identifiers 68615,68626,68630,68658,and 68672 with coordinates Chr14:23,893,893,240,Chr14: 23,946,946,456,Chr14:23,571,571,874,Chr14:23,853,853,831,and Chr14: 23,421,421,953,respectively. These clearly all represent the same Birth Weight QTL and were collapsed into a single entry in Table 5. However, this feature of the CattleQTLdb makes it difficult to perform enrichment analyses, since the numbers of trait associations detected in any study is dependent upon the number of markers used. The CattleQTLdb contains 161,781 QTL from 1,049 publications representing 680 different traits (https://www.animalgenome. org/cgi-bin/QTLdb/index; Accessed August 11, 2021). The bovine autosomal genome length is 2,489.39 Mb in the ARS-UCD1.2 bovine reference genome assembly and so if QTL are randomly located through the genome, we would expect to find an average of 64.98 QTL within each 1 Mb window. We retrieved an average of 61.25 ± 45.49 QTL or Association entries for 27.83 ± 12.07 traits within the 12 genomic windows in Table 5 indicating that these genomic regions are typical for their QTL content.
Two features of These data suggest that we should have found, on average, 1.41 QTL or Association entries within the CattleQTLdb per 1 Mb autosomal region. From the Poisson distribution, the probability of 1 or more CattleQTLdb entries per region is 0.76 and from the Binomial distribution, the probability of 7 or more regions possessing CattleQTLdb entries is 0.95. The second feature is that 11 of the 12 differentiated regions harbor QTL for Birth Weight or Calving Ease for which 1,180 and 4,105 QTL or Association entries are reported, respectively, in the CattleQTLdb. This corresponds to an average of 2.12 database entries per 1 Mb window and a probability of 0.57 of 11 or more of the windows having CattleQTLdb entries for Birth Weight or Calving Ease. However, the distributions of database entries for Birth Weight and Calving Ease by chromosome are not random, with 511 (43.3%) entries for Birth Weight on chromosome 6 and 1,968 (47.9%) entries for Calving Ease on chromosome 21. Removing these outlier chromosomes from the analysis leads to the expectation of 1.22 database entries per 1 Mb window and a probability of only 0.09 of finding entries for Birth Weight or Calving Ease in at least 11 of 12 randomly sampled 1 Mb windows.
Ten of the regions in Table 5 have CattleQTLdb entries for growth and weight traits and 7 of these are for Weaning Weight. We found 616 CattleQTLdb entries for Weaning Weight, again with an overrepresentation on chromosome 6 (146 or 24%). After removing chromosome 6 from the analysis, this corresponds to an average of only 0.20 database entries per 1 Mb window for the remainder of the autosomal genome and a probability of only 0.002 of finding at least 7 out of 12 randomly sampled 1 Mb genomic regions containing database entries for Weaning Weight. Four of the regions in Table 5 have CattleQTLdb entries for Marbling Score for which there are 441 CattleQTLdb entries corresponding to an average of only 0.18 database entries per 1 Mb window for the autosomal genome. This generates a probability of 0.12 of finding at least 4 out of 12 randomly sampled 1 Mb genomic regions containing database entries for Marbling Score. Finally, there are 19,572 entries in CattleQTLdb for Fertility indicating that we should have expected to find fertility trait QTL in all 12 of the genomic regions reported in Table 5.
These results suggest that artificial selection on production traits practiced by breeders of American Breed cattle is responsible for the surfeit of taurine alleles at the 12 loci in Table 5. Our analyses suggest that American Breed cattle breeders are placing the greatest selection emphases on growth, calving ease, and meat quality traits, which is consistent with the findings of Decker et al. (2012) in U.S registered Angus cattle.

Genomic Divergence Between Early-and Advanced-Generations
Although statistical power was limited when comparing the Brahman content of the genomes of early-and advancedgeneration animals due to the use of only ∼10% of the animals from each breed in each tail of the haplotype number distribution, these analyses clearly revealed that the Brahman proportion of the genome of these American breeds has increased with generation number and that this increase has occurred almost genome-wide.
There are several interesting features of Supplementary  Table S3. First, we found two of the most differentiated 1 Mb regions between early-and advanced-generation American Breed animals to have no previously identified QTL within them. Considering an average of 64.98 QTL within each 1 Mb window for the CattleQTLdb, the probability of two such intervals in 15 randomly sampled 1 Mb genomic regions is small (p 3.88E-55), indicating that the selected loci in these regions impact traits that have not been well studied in cattle, suggesting that they may have roles in environmental adaptation. Second, the locus on chromosome 6 at 71.5 Mb in Beefmaster supports our conjecture that Beefmaster breeders have selected for solid coat colors in advanced-generation animals, despite the fact that the breed does not have coat color standards. Finally, if we remove the locus on chromosome 6 at 71.5 Mb in Beefmaster, 11 of the 12 genomic regions with identified QTL harbor QTL that influence heat tolerance or animal health, immune response, parasite burden or disease susceptibility. The probability of 11 or all 12 regions harboring disease or parasite CattleQTLdb entries reduces to 0.18. These results suggest that these genomic regions have been selected to increase the proportion of Brahman alleles because they confer an adaptive advantage in American Breed cattle.

CONCLUSION
The American breeds are advanced generation B. t. taurus × B. t. indicus hybrid cattle that were developed to capitalize on breed complementarity and heterosis to produce cattle that were suited to harsh, subtropical climates as well as disease and parasite threats, while still maintaining acceptable levels of growth and productivity. The breeds employed mating systems designed to produce cattle that were either ⅝ taurine and ⅜ indicine (Brangus and Santa Gertrudis) or ½ taurine and ½ indicine (Beefmaster). However, we found strong evidence that selection for polledness, coat color phenotypes, growth, calving ease, and intra-muscular fat content produced early-generation cattle with much smaller than expected indicine composition within the genomes of all three breeds. At least 85% of the genomes within each of the breeds possess less Brahman ancestry than expected in the absence of selection and drift. We utilized the total number of haplotypes predicted by RFMix in the diploid genome of each animal as a proxy for the generation number for each animal and when we ranked animals based on this proxy, we detected an increase in the Brahman genome content in advanced-generation cattle of all three breeds. By comparing the genomes of early-and advanced-generation animals within each breed, we found evidence for strong selection for indicine alleles which likely confer an adaptive advantage for heat tolerance and healthfulness in advance-generation animals.

DATA AVAILABILITY STATEMENT
The datasets analyzed in this study belong to the respective Breed Associations. For access to the data, contact: Darrell Wilkes, Executive Vice President, International Brangus Breeders Association: dwilkes@gobrangus.com. Collin Osbourn, Executive Vice President, Beefmaster Breeders United: cosbourn@ beefmasters.org, Webb Fields, Executive Director, Santa Gertrudis Breeders International: wfields@santagertrudis.com.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because all animals for which samples were obtained for genotyping in this study were either sampled via cryopreserved semen or tissue samples (e.g., hair, blood) obtained by the owners of the cattle according to standard industry practices. Written informed consent for participation was not obtained from the owners because genotype data are owned and were provided to us by the three Breed Associations.

AUTHOR CONTRIBUTIONS
All authors contributed to the overall design and approaches used to conduct the research and read and edited the final manuscript. TEC and JFT conducted the analyses and wrote the manuscript. RDS managed the genotypes and prepared files for the analyses. JED oversaw the development of the genotype imputation pipeline and generated the imputed data set.