Genome-Wide Analyses Reveal the Genetic Architecture and Candidate Genes of Indicine, Taurine, Synthetic Crossbreds, and Locally Adapted Cattle in Brazil

Cattle population history, breeding systems, and geographic subdivision may be reflected in runs of homozygosity (ROH), effective population size (Ne), and linkage disequilibrium (LD) patterns. Thus, the assessment of this information has become essential to the implementation of genomic selection on purebred and crossbred cattle breeding programs. In this way, we assessed the genotype of 19 cattle breeds raised in Brazil belonging to taurine, indicine, synthetic crossbreds, and Iberian-derived locally adapted ancestries to evaluate the overall LD decay patterns, Ne, ROH, and breed composition. We were able to obtain a general overview of the genomic architecture of cattle breeds currently raised in Brazil and other tropical countries. We found that, among the evaluated breeds, different marker densities should be used to improve the genomic prediction accuracy and power of genome-wide association studies. Breeds showing low Ne values indicate a recent inbreeding, also reflected by the occurrence of longer ROH, which demand special attention in the matting schemes to avoid extensive inbreeding. Candidate genes (e.g., ABCA7, PENK, SPP1, IFNAR1, IFNAR2, SPEF2, PRLR, LRRTM1, and LRRTM4) located in the identified ROH islands were evaluated, highlighting biological processes involved with milk production, behavior, rusticity, and fertility. Furthermore, we were successful in obtaining the breed composition regarding the taurine and indicine composition using single-nucleotide polymorphism (SNP) data. Our results were able to observe in detail the genomic backgrounds that are present in each breed and allowed to better understand the various contributions of ancestor breeds to the modern breed composition to the Brazilian cattle.


INTRODUCTION
Cattle domestication has generated many differentiated phenotypes that evolved into distinct breeds scattered across the globe. Evidence based on mitochondrial DNA (mtDNA) data indicates that divergence between taurine and indicine subspecies occurred approximately 10,000 years ago, long before domestication (Loftus et al., 1994). It is estimated that taurine cattle emerged in the Fertile Crescent region spanning modernday Iraq, Syria, Lebanon, Israel, Palestine, Jordan, the northeast and Nile valley regions of Egypt, the south-eastern region of Turkey, and the western fringes of Iran. On the other hand, indicine cattle emerged in the Indus Valley, including northeast Afghanistan, including a great part of Pakistan, and western and northwestern India (Ajomone-Marsan et al., 2010). Some specific traits that shaped the domesticated bovine species took place long after domestication started, and artificial selection brought even more complexity and diversity to cattle subspecies in accordance with their evolving environment.
In Brazil, the first taurine animals were introduced by the European conquerors between the sixteenth and seventeenth centuries to be used for food, leather, and animal traction. After several generations of random crossings in distinct and variable ecosystems throughout the country, these animals became adapted to a wide range of environments and showed different levels of improved fitness to local conditions. They become recognized as Iberian-derived locally adapted breeds (Egito et al., 2007;McManus et al., 2009), such as the Caracu lineages (CaracuCaldeano for dairy purpose and Caracu for beef), CriouloLageano, CurraleiroPé-Duro, and CriouloLageano do RS.
In addition to the taurine breeds, indicine cattle were brought to Brazil in the nineteenth century (Medrado, 2018). Since then, indicine breeds (e.g., Nelore and Gir) have been extensively used to increase the production and robustness of the cattle herds throughout continental Brazil. The indicine breeds have also been successfully used to the generation of crossbreds and synthetic breeds in Brazil such as Girolando dairy cattle and Canchim beef cattle (Buzanskas et al., 2015). Thus, Brazil currently displays four genetic groups of cattle: taurine, indicine, locally adapted, and synthetic crossbreds.
Population history, breeding systems, and geographic subdivisions of the Brazilian cattle may be accessed through runs of homozygosity (ROH) and linkage disequilibrium (LD) patterns throughout the genome. Runs of homozygosity are continuous homozygous segments of DNA (Gibson et al., 2006), while LD is a nonrandom association between alleles at different loci (Ardlie et al., 2002). The calculation of these parameters is essential to perform genome-wide association studies (GWAS) and also to implement genomic selection (GS) in cattle breeds.
Several studies have shown that genomic associations and estimated breeding values (EBVs) are affected by the LD pattern among populations (Meuwissen et al., 2001;Hayes et al., 2009;Marigorta and Navarro, 2013;Veroneze et al., 2014). In addition, Luan et al. (2014) showed that predicting genomic EBV based on ROH resulted in more accurate estimates when compared to EBV generated through linkage analysis relationships. Runs of homozygosity information also allows the identification of islands of homozygosity shared among individuals, which may be the result of selection events (Zhang et al., 2015). Thus, the knowledge of LD and ROH patterns in purebred and crossbred cattle has become essential for designing efficient genomic selection programs.

Data
Samples used in this study were obtained from Brazilian commercial herds and sent by the owners to Embrapa Research Centers, Brazil. In the case of males, semen samples were sent, and in the case of females, blood aliquots were sent from routine collection to monitor Brucellosis disease in the herds. Since biological samples from all animals were collected by the proprietors during routine zootechnical practices in their commercial herds, this procedure does not fit in the Brazilian National Council for the Control of Animal Experimentation law. In this way, an ethic statement was not required to generate the results in this manuscript. It is important to note that all the animals used in this study were chosen according to their population representativeness in each breed aiming to avoid bias due to the use of a limited number of lineages in the population. Thus, to safeguard the genetic variability of each breed, we used the principal components analysis to define the best breed representative samples for those with a high number of genotyped animals. For smaller samples, such as the locally adapted breeds, we did consider the pedigree and geographic distribution of the sampled animals. Animals were genotyped using commercial genotyping arrays Illumina Bovine HD, Bovine SNP50 Genotyping BeadChip (Illumina, San Diego, CA, United States), and GeneSeek GGP BosIndicus HD chip (Neogen, Lansing, MI, United States). A standardized set of single-nucleotide polymorphisms (SNPs) based on the overlapping of the SNPs among the chips was used as the data set to analyze each breed. SNP genotypes were excluded from the analysis when the call rate <0.95 and minor allele frequency <0.01. In addition, samples showing overall call rates <0.85 were excluded from further analysis. A total of 432 taurine, 366 indicine, 281 locally adapted, and 241 crossbred animals were analyzed after quality control (Table 1). Moreover, it is important to note that the remaining markers covered the whole bovine genome.

Linkage Disequilibrium and Its Decay Over Time
To evaluate the extent of LD, we used the r2 statistics, since this approach is robust and not sensitive to gene frequency and effective population size (Terwilliger et al., 2002;Zhao et al., 2007). PLINK v1.07 (Purcell et al., 2007) was used to calculate pair-wise linkage disequilibrium between SNPs using a genomic distance of 1 Mb by the following command:./plink -bfile filename -cow -ld-window 5 -ld-window-kb 1000 -ld-window-r2 0 -not-chr 0 30-33 -out outname -r2.
Parametric nonlinear regression models have been used to study the physical distance between markers (LD decay) as reported by Heifetz et al. (2005), Amaral et al. (2008), Abasht et al. (2009), andVeroneze et al. (2014). Thus, the pairwise r 2 ij from each population were regressed on the distance between the marker pairs using a nonlinear model described by Sved (1971): where LD ijk was the observed r 2 ij between SNPs i and j in breed k; d ij was the distance in kb (kilobase pair) between SNPs i and j; ß k was the coefficient that describes the LD decay with distance for breed k; and e ijk was a random residual defined as e ijk N(O, s 2 ). In summary, according to Sved (1971), smaller ß k estimates indicate a higher extent of LD. Additionally, this model is useful under situations involving individual's relationship through infinitely long pathways from remote ancestors and also by descent from recent ancestors, such as bottlenecks at some point in history followed by admixture with modern breeds mainly derived from semen importation.
A test for the equality of curves was based on ß k estimates to compare the nonlinear curves in all breeds. This test allowed a statistical comparison of the LD decay parameter (β) and the visualization of clusters among breeds. Clustering was performed based on NbClust package v. 3.0 (Charrad et al., 2014) of software R (R Core Team), and groups were identified based on Euclidian distance under Ward aggregation method. These indices were proposed according to the choice of the best grouping based on the maximum and minimum differences, respectively, between and within group indices. Based onß values clustering, a dendrogram was generated using the hclust function on R software v. 4.0.0 (R Core Team) in order to point out for similarities among the contrasting groups. The nonlinear models were fitted using the nls function of R software (R Core Team).

Effective Population Size
The effective population size (N e ) in the corresponding generation (T) was estimated from the associated estimated average of r 2 . Thus, N e was estimated across generation according to the following formula: where c is the marker distance in Morgan, assuming 1 Mb = 1 cM, and T = 1/2c (Hayes et al., 2003;Meuwissen and Goddard, 2007). From the estimated r 2 , the average values were grouped into four ranges of distances (<0.01 Mb, 0.011-0.03 Mb, 0.031-0.05 Mb, and >0.05 Mb) and used to determine the N e in 50, 16.6, 10, and 5 generations, respectively.

Runs of Homozygosity
To date, studies have used different methodologies for predicting ROH. Peripolli et al. (2017) evaluated several ROH studies in livestock and found that there is little consensus on the parameters used to define ROH. In this way, we established the parameters and thresholds after several "pilot" analyses as showed by Peripolli et al. (2017). The software PLINK v1.07 (Purcell et al., 2007) was used for each population with the following parameters: minimum window length of 120 SNPs, maximum gap size between two SNPs of 1,000 kb, minimum ROH length of 1,000 kb, minimum number of potential marked SNPs of 50, one heterozygote allowed per window, maximum of five missing calls per window, sliding window length of 50 SNPs, and proportion of overlapping windows that must be a homozygous >0.05.
In addition, to identify ROH islands throughout the genome of each breed, homozygous segments shared by more than 50% of the individuals were chosen as an indication of homozygosity. The "-homozyg-group" function implemented in PLINK was used to assess ROH islands shared among individuals. The GenBank annotation based on the ARS-UCD1.2 assembly of the bovine genome was used to identify genes in ROH regions. Gene networks highlighting biological processes among the gene sets identified for each breed group (taurine, indicine, locally adapted, and synthetic crossbreds) were generated using the ClueGO plugin for Cytoscape (Bindea et al., 2009). ClueGO takes one or more set of genes and pursues for Gene Ontology (GO) term or pathways establishing edges between each gene and GO chosen term using a two-sided hypergeometric test and Bonferroni correction. This procedure allowed the creation of gene networks highlighting biological roles and the comparison of gene clusters by visualizing their functional differences or similarities.

Breed Composition
The ADMIXTURE package (Alexander et al., 2009), which places individuals into K predefined clusters based on genotype data, was used to evaluate the hierarchical clustering across the 19 evaluated breeds. Ancestry was estimated using a reduced panel (4,547 SNP) pruned by LD between subsequent markers and randomly selected animals composing reduced groups of up to 33 animals per breed. Considering our predefined breed origins (taurine, indicine, synthetic breed, and locally adapted), we evaluated all populations by using K values ranging from 2 to 5. To investigate the phylogenetic relationships among populations, the Euclidean distance between populations was evaluated using dartR package (Gruber et al., 2018) in R, and then, a phylogenetic tree was constructed using a neighbor-joining approach with APE program (Paradis et al., 2004). The principal component analysis (PCA) was performed with SNPs data using the Variation Suite Golden Helix v8.9.0 software (Golden Helix Inc., Bozeman, MT, United States).

Linkage Disequilibrium
The LD decay parameter (β) was estimated from the curves of r 2 values over genomic distances (bp) from each population using a nonlinear regression model ( Table 2). Statistical tests for the equality of LD decay curves showed that the overall pattern of the inflection point differed among breeds of taurine, indicine, and locally adapted groups, except for synthetic crossbreds (Figure 1). Since smaller values ofß indicate a higher extent of LD, it was used for clustering the different breeds accordingly to these estimates (Figure 2). Based on that, we were able to observe four distinct clusters. CAN, CRI, and GIO (cluster 1) showed the smallest extent of LD values, while CUR and JER (cluster 4) showed the highest extent of LD values.

Effective Population Size
N e was estimated across four generations classes (5, 10, 16.6, and 50) according to markers distances in the genome, in which the shorter is the marker distance, the higher is the estimated generation (Figure 3). To observe the behavior of N e across generations in each population, the difference between generations 50 and 5 was calculated (Supplementary Table 1). BRA showed the largest difference in N e values, ranging from 123.53 at generation 50 to 14.3 at generation 5. On the other hand, JER showed the smallest values, ranging from 48.55 to 9.79 at generation 50 and 5, respectively. CUR breed showed the smallest N e value (9.02) at generation 5.

Runs of Homozygosity
The pattern of ROH differed across breeds (Figure 4). Considering all breeds, CUR showed the highest percentage of long-range ROH (> 31Mb; >10%), while ANG, NEL, TAB, GIO, and PAN showed the higher percentages of short range ROH (<5 Mb; >60%). In addition, based on ROH islands analysis, seven regions were observed to be shared by more than 50% of the individuals in JER (taurine); BRA, IND, and TAB (indicine); CCD, CCB, and CUR (locally adapted) breed groups ( Table 3). Gene networks highlighting biological associations were generated based on genes found on ROH islands  ( Supplementary Figures 1-3). Several biological processes were highlighted, including neutrophil-mediated killing of bacteria, interferon-alpha response, and prolactin signaling pathway.

Breed Composition
Breed composition for each population was obtained using the admixture analysis ( Figure 5). With K = 2, we observed that each group showed distinct genomic compositions representative of taurine (red) and indicine breeds (blue). Locally adapted breeds display a taurine genome content ≥65%. In addition, it was consistently shown that synthetic crossbreds shared taurine and indicine genome content. At K = 3, two taurine ancestries were discriminated (light green and red), whereas at K = 4, a third taurine ancestry was predominant in the locally adapted breeds (purple), and the light green taurine ancestry was predominant in the JER breed. At K = 5, ANG breed showed a distinct taurine ancestry (yellow), whereas CAN displayed four uniform taurine composition (red, green, yellow, and purple). At this K, we could notice four taurine breeds showing three well-defined taurine ancestry (yellow, 68%; red, >82%; and light green, 88%). GIO breed showed the red taurine ancestry of 52%, while in CAN breed, the taurine ancestry was shared by the red (17%), green (13%), yellow (16%), and purple (21%) components. Among the indicine breeds, we observed a well-defined indicine ancestry in blue (>92%), whereas NEL and GIR breeds showed the highest ancestries (99% and 100%, respectively). The locally adapted breeds showed mostly the purple taurine ancestry (>43%), and CCD was the most homogeneous (95%). In addition, the neighbor-joining analysis also clearly discriminated the history of development and the ancestry of each group of breeds ( Figure 6). As expected, indicine, taurine, and locally adapted breeds showed very distinct and separated clustering. The crossbred GIO was grouped with taurine breeds closer to HOL, while CAN grouped with the indicine breeds closer to BRA. The PCAs (Figure 7) were able to bring more details in the clustering pattern of the breeds and showed six distinct groups. The indicine breeds clustered very close together (group 3), and the taurine breeds showed two distinct clusters: group 1A (HOL and BWS) and group 1B (ANG and JER). The locally adapted breeds clustered together in group 4, while the crossbred GIO clustered in group 2A, and the synthetic CAN clustered in group 2B.

DISCUSSION
The LD decay observed in this study among all cattle breeds differed based on LD curves parameter estimates (ß). However, all breeds followed a similar pattern of rapid decrease in LD as the physical distance increases in the genome. Previous LD decay studies in different breeds were performed using the average r 2 and showed the same pattern of LD decrease (McKay et al., 2007;Lu et al., 2012;Zhu et al., 2013;O'Brien et al., 2014;Mokry et al., 2014;Toro Ospina et al., 2019;Xu et al., 2019), although they did not use a linear model to analyze the equality of the curves as performed in this study. We were able not only to observe the existence of breed differences but also do a pairwise comparison across all breeds. Based on this test, we were able to identify four LD decay clusters.
The extent of LD provides an insight about the number of SNPs required for GS and GWAS. Breeds that clustered together showed the same LD decay pattern, suggesting that the same marker density could be used in genomic studies for each of these breeds. However, this does not imply that the same marker set is suitable for a particular breed cluster group since different markers may be segregating in different breeds. For example, in cluster 3, we observed the presence of taurine and indicine breeds (HOL and NEL, respectively) highlighting the similarity of LD extent, although they have different genetic backgrounds. Thus, the set of SNPs that provides the best prediction accuracy at a given marker density should be different between HOL and NEL. We also observed different extents of LD between clusters 1 and 4, with higher LD observed in cluster 4 and lower in cluster 1. This information implies that different marker densities may be used for GS and GWAS for these breeds, which could also influence the accuracy of GS. However, in a practical way, a minimal marker density of 60 kb would be useful for all breeds besides removing monomorphic SNPs.
A run of homozygosity is the probability that all consecutive markers on a pair of homologous chromosome segments, in the same or different individual(s), display identical alleles (Hayes et al., 2003). The extent and frequency of ROH may provide information about the ancestry of an individual and its population. Even thought different threshold values could be implied from different results for the same population, our aim was to compare breeds according to their ROH. For that, we used the same parameters and thresholds for all breeds, and we were able to observe relevant different patterns of ROH among them. Moreover, inbreeding may be inferred from the presence of long ROH, with longer segments indicating recent inbreeding within a population (Kirin et al., 2010) and suggesting a small effective population size. Thus, we also evaluated the N e to better understand the relationship between ROH and N e . The CUR breed showed the lowest N e value in the estimated generation 5, suggesting a recent inbreeding that is supported by the occurrence of the longest ROH found in this study. Using microsatellite markers, Egito et al. (2007) also detected inbreeding in this breed, which is a locally adapted breed originated from taurine animals brought to Brazil during the first decades of colonization. According to Carvalho (2015), this breed comprises approximately five thousand individuals only. Based on our results, special attention should be given to matting schemes in this breed to preserve the germplasm. On the other hand, further locally adapted breeds such as Caracu (CCB and CCD), CRI, and PAN seem to be successful in relation to matting schemes, since these breeds showed higher N e and prevalence of small ROH. As expected, synthetic crossbreds (GIO and CAN) showed high occurrence of small ROH and N e , since these breeds were recently originated from a systematic crossing breeding up to the establishment of each breed. The BRA breed showed the largest differences in N e values across generations, 123.53 at generation 50 to 14.3 at generation 5, suggesting strong inbreeding events along generations that could be explained by the breeding programs that presumably minimized the inbreeding occurrence in this breed. JER, BRA, IND, TAB, CCD, CCB, and CUR breeds showed ROH islands in more than 50% of the individuals. From those islands, genes were retrieved and their biological processes analyzed aiming to understand their roles in each breed. In JER, we identified genes in chromosome 7 associated with lipid translocation (ABCA7) and neutrophil-mediated killing of bacteria (AZU1 and ELANE). ABCA7 gene encodes for ATP binding cassette subfamily A member 7 protein, previously reported to be expressed in the mammary gland of lactating cows (Farke et al., 2008) and related with lipid transport (Ontsouka and Albrecht, 2014). AZU1 and ELANE encode for azurocidin 1 and elastase, neutrophil expressed proteins, respectively, which are associated with the immune system (Cassatella et al., 2019). These results corroborate the fact that Jersey animals are known to exhibit less disease (e.g., clinical  mastitis) (Washburn et al., 2002) and higher fat content in milk (Carroll et al., 2006). ROH islands were observed on chromosome 14, 6, and 1 in the indicine breeds BRA, IND, and TAB, respectively. In the homozygous regions of BRA, we found a gene related to behavioral fear response (PENK). It is reported that animals with Brahman genetics may be considered more reactive to humans when compared with taurine breeds, such as Angus and Hereford (Boissy et al., 2005). In the IND ROH islands, we found the positive regulation of estradiol secretion biological process associated with SPP1 gene that encodes for secreted phosphoprotein 1, in which variations have been associated with lactation persistency in dairy cattle (Bissonnette, 2018) and mammary health status (Alain et al., 2009). We found IFNAR1 and IFNAR2 genes, associated with response to interferon-alpha biological process, in the chromosome 1 of TAB breed. These genes have been reported to be expressed in bovine placentome tissues during early pregnancy, associated with immune tolerance protection of the mother and fetus (Wang et al., 2018). Indicine cattle are well known by their rusticity, so the presence of ROH islands linked to genes associated with the immune system and health status was expected.
We also observed ROH islands in locally adapted breeds CCD, CCB, and CUR. Interestingly, CCD and CCB shared a small segment of their islands on chromosome 20, in which the SPEF2 gene was found. This gene encodes for sperm flagellar 2 protein, which is suggested to play an important role in elongating spermatids to ensure proper male germ cell differentiation in mice (Lehti et al., 2017). In addition, homozygosis mutations in this gene have been reported to be associated with sperm morphological abnormalities and male infertility in humans  (Liu et al., 2020). Based on that, this gene could be a potential candidate for male fertility in cattle and should be further studied.
Furthermore, from the genes found in ROH islands of CCD breed, we found the prolactin receptor gene (PRLR), which was found to be associated with the prolactin signaling pathway biological process. Likewise, variations in this gene have been associated with milk performance traits (Zhang et al., 2008;Lü et al., 2011). On the other hand, a highlighted biological process detected in the CCB breed was the negative regulation of T-cellmediated cytotoxicity, associated with IL7R gene. In humans, variations in this gene have been linked with autoimmune risk (Al-Mossawi et al., 2019), while in livestock animals, it has been described to be associated with growth traits (Jia et al., 2019;Zhu et al., 2019), which corroborates the purpose of this breed formation as a beef cattle.
In the locally adapted CUR breed, we found an ROH island on chromosome 11, and the biological process network highlighted a subnetwork involved with negative regulation of JAK-STAT cascade, linked to LRRTM1 and LRRTM4 genes. JAK-STAT cascade is critical to cellular events such as immune development (Rawlings et al., 2004). In addition, LRRTM1 and LRRTM4 genes encode for leucine-rich repeat transmembrane neuronal 1 and 4, respectively. It is suggested that leucinerich repeat proteins family is functionally related to multiple diseases in humans (Winther and Walmod, 2014). Considering the known adaptability and rusticity of CUR breed and the immune system-related genes identified in this study, we could suggest that CUR breed has been successfully selected for rusticity regardless of its small population effective size found in this study.
The breed composition for each population was obtained using the admixture model, neighbor-joining analysis, and PCAs. In this study, we evaluated the proportion of genotypes across the 19 breeds by altering the K value from 2 up to 5. At a K = 2, distinct genotypes representative of taurine and indicine breeds were observed. The taurine ANG breed showed what seems to be a composition from indicine background (18%). A reasonable explanation for that may be the use of indicine animals for cross breeding or another population founder effects. Indicine breeds were used in the beginning of their production in Brazil aiming to obtain adapted animals for the tropical weather (Madalena et al., 2002). This is supported by the ROH results from this study, in which a great occurrence of short ROH was observed for ANG breed, in agreement with the crossbreeding hypothesis. Distribution of shorter ROH may also suggest the presence of more ancient kinship, which is unaccounted in pedigree records due to the limitations in the extension of the recording processes (McQuillan et al., 2008). At K = 2, the locally adapted breeds displayed a taurine genome content ≥ 65%.This was expected since these breeds were brought to South America brought by European colonizers and were mainly formed by taurine breeds along their history.
At K = 3, two taurine genotypes were discriminated, whereas in K = 4, a third taurine ancestry could be found in the locally adapted breeds. This genetic background noticed in the adapted animals may be the result of the Iberian origin of these animals. At K = 4, a light green taurine ancestry was highlighted in the JER breed. This could be explained by the fact that this breed was only "discovered" around 200 years ago in the Jersey island, and this fact could have contributed to the maintenance of a conserved and distinct genetic composition of this breed, especially when compared with HOL, as also observed by Huson et al. (2020).
Moreover, the K = 5 highlighted the four taurine breeds showing three distinct taurine ancestries (ANG, yellow; BWS and HOL, red; and JER, light green) in addition to the supposed Iberian-derived taurine composition (purple) observed in locally adapted animals. This scenario is in agreement with cattle domestication events that occurred in Europe, in which European breeds would descend from domestication events of aurochs (Bosprimigenius) in the Near East and its further introduction throughout the European continent. Environmental conditions and domestication events could explain the diverse taurine backgrounds found in Europe. In this way, southern European breeds could have been more affected by introductions from northern Africa as well (Beja-Pereira et al., 2006).
At K = 5, the GIO breed showed mainly the red taurine composition (52%), while in the CAN breed, the taurine composition was shared by red, green, yellow, and purple taurine ancestries. GIO breed was developed for milk production in tropical and subtropical conditions through HOL with GIR breeds crossing followed by mattings schemes to establish the genetic composition of 5/8 HOL and 3/8 GIR . On the other hand, CAN breed was developed to generate beef cattle with higher growth rate and meat quality adapted to the tropical environment. CAN genetic background was formed by crossing taurine Charolais breed with indicine Guzerat, IND, and NEL breeds, with mattings being directed to set the genetic composition of 5/8 Charolais and 3/8 indicine (Barbosa, 2000).
At K = 5, we noticed the presence of a highly homogeneous indicine ancestry in the indicine breeds BRA, GIR, IND, NEL, SIN, and TAB. Although the importation of indicine breeds started approximately 150 years ago and have been intensively genetically improved in the last 40-50 years, we highlight that the indicine background is very conserved, especially in NEL and GIR breeds that showed ≥99% of indicine background. This information could be used to support international animal market between Brazil and foreign countries such as India and other tropical cattle producer countries that were recently interested in importing this indicineimproved breeds.
As expected, the locally adapted breeds CCB, CRI, CUR, FRA, MOC, and PAN showed a mostly taurine and mainly purple ancestry background, but also displayed indicine background ranging from 7 to 27%.The indicine background may have been incorporated due to recent crossing with indicine breeds in Brazil. One exception was the CCD breed that showed 95% of the purple taurine and only 1% of indicine backgrounds. Caracu lineages (dairy and beef) were originated by crossings between breeds brought from Iberian Peninsula to Brazil, including the Portuguese Minhota Breed. Natural selection throughout centuries enabled their adaptation to adverse conditions, contributing to their rusticity and high feed efficiency (3). In addition, these breeds were kept as purebred in regions of São Paulo state for meat production (CCB) and selected in Minas Gerais state for milk production (CCD), originating these two Caracu lineages. In a previous study of genetic diversity with locally adapted and zebu breeds, we also observed that CCD showed a homogeneous population (Campos et al., 2017), corroborating with the higher taurine background found in this study.
Moreover, our results of neighbor-joining analysis and PCAs reinforce the admixture results in the genetic discrimination among cattle breeds and their development history. From the neighbor-joining analysis, indicine, and taurine breeds are shown widely apart following the genetic distances. On the other hand, locally adapted breeds showed a closer genetic distance to taurine breeds, as predictable. The GIO breed clustered with taurine breeds and closer to HOL, while CAN clustered with the indicine breeds and closer to BRA breed, as expected. The PCAs also showed the same results overall, and highlighted the genetic proximity of the indicine breeds, as they clustered together in a close and narrowed span group. Thus, it is possible to assume that indicine breeds have kept their majority genetic background from their ancestors in India, although they have been extensively selected and genetically improved for important economical traits in Brazil.
In this study, we adjusted a linear model to analyze the equality of the LD curves across 19 cattle breeds raised in Brazil, which revealed different extents of LD. Our results suggest that different marker densities may be used for a giving accuracy of GS and GWAS in these breeds. We found that CUR breed showed the lowest N e value in the estimated generation 5, suggesting a recent inbreeding also reflected by the occurrence of longest ROH islands in our analyses. Thus, a special attention in matting schemes for this breed is needed. Genes located in ROH islands were evaluated and explored throughout their biological processes, which provided genetic insights about the breeds. We were successful in revealing the breed composition of all breeds besides the complex taurine and indicine compositions in all breeds. In addition to the LD clustering results, we suggest that taurine SNP genotyping arrays may also be used in locally adapted breeds, since they seem to share most of their genetic architecture and LD extent.
A general overview of the genetic architecture of 19 cattle breeds raised in Brazil was successfully obtained. Special attention should be given to CUR and JER breeds in order to improve matting schemes regarding the low values of N e . Even though BRA breed did not show the smallest N e in this study, it showed a significant decrease in N e , ranging from 123.53 at generation 50 to 14.3 at generation 5, and must be carefully examined by the current breeding programs. Candidate genes were found on ROH islands in various breeds, and their association with important traits were highlighted (e.g., ABCA7 with JER milk protein content; PENK with behavior fear response in BRA; SPP1 with mammary health status in IND; IFNAR1 and IFNAR2 with immune system features in TAB; SPEF2 with male infertility in both CCD and CCB; and PRLR with milk production in CCD; LRRTM1 and LRRTM4 with negative regulation of JAK-STAT cascade in CUR).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because it used genotypes data provided.