Comparing Class II MHC DRB3 Diversity in Colombian Simmental and Simbrah Cattle Across Worldwide Bovine Populations

The major histocompatibility complex (MHC) exerts great influence on responses to infectious diseases and vaccination due to its fundamental role in the adaptive immune system. Knowledge about MHC polymorphism distribution among breeds can provide insights into cattle evolution and diversification as well as population-based immune response variability, thus guiding further studies. Colombian Simmental and Simbrah cattle’s BoLA-DRB3 genetic diversity was compared to that of taurine and zebuine breeds worldwide to estimate functional diversity. High allele richness was observed for Simmental and Simbrah cattle; nevertheless, high homozygosity was associated with individual low sequence variability in both the β1 domain and the peptide binding region (PBR), thereby implying reduced MHC-presented peptide repertoire size. There were strong signals of positive selection acting on BoLA-DRB3 in all populations, some of which were poorly structured and displayed common alleles accounting for their high genetic similarity. PBR sequence correlation analysis suggested that, except for a few populations exhibiting some divergence at PBR, global diversity regarding potential MHC-presented peptide repertoire could be similar for the cattle populations analyzed here, which points to the retention of functional diversity in spite of the selective pressures imposed by breeding.


INTRODUCTION
The major histocompatibility complex (MHC) plays an important effector role in the adaptive immune response (Rock et al., 2016). This particular system offers a unique opportunity for addressing functional and evolutionary diversity issues in many species (Trtkova et al., 1995). The MHC is formed by a group of loci encoding specific cell surface glycoproteins which are necessary for T-lymphocyte antigen peptide recognition (Rock et al., 2016). MHC class I proteins are expressed by all nucleated cells and are related to presenting antigens to CD8 + T-lymphocytes processed in the intracellular compartment, thereby eliciting cytotoxic responses (Neefjes et al., 2011;Rock et al., 2016). By contrast, MHC class II proteins are expressed by professional antigen-presenting cells and are associated with the presentation of extracellular antigen peptides to CD4 + T-lymphocytes, triggering cellular or humoral responses against various pathogens (Neefjes et al., 2011;Rock et al., 2016).
The MHC in cattle has been called the bovine leukocyte antigen (BoLA) and the genes encoding the expression of class II antigen presentation (DR and DQ)-related molecules are located in chromosome 23 IIa subregion (Ellis, 2004). BoLA-DR consists of the DRA monomorphic locus and three DRB loci, of which BoLA-DRB3-characterized by a high degree of polymorphism, with 330 different alleles reported to date-is the only one which has been described as functional (Maccari et al., 2017). Such polymorphisms are concentrated in the second exon which encodes the peptide binding region (PBR) β1 domain and has been used for determining BoLA-DRB3 alleles (Sigurdardóttir et al., 1991). Such high variability determines the amino acids (aa) forming PBR binding pockets, influencing the peptides presented on MHC for different alleles and setting different repertoires modulating the immune response (Ellis, 2004;Baxter et al., 2009). BoLA-DRB3 diversity could be used for estimating potential peptide-binding repertoire size, based on the assumption that highly divergent alleles are associated with broader peptide spectra (Klein et al., 2007).
The Simmental is a cattle (Bos taurus) breed that was selected in North America and Europe, mainly for increasing meat production efficiency (Amaya et al., 2020). The introduction of Simmental to Colombia 5 decades ago aimed to increase both milk and beef production by artificial insemination-based genetic improvement schemes using semen from proven bulls in North America and Europe (Amaya et al., 2020). Simbrah is considered a composite breed developed to combine Brahman cattle (Bos indicus) adaptability, maternal instinct, hardiness and disease resistance with Simmental fertility, milk production and beef quality (Goszczynski et al., 2018;Amaya et al., 2020). Most tropical countries where Simbrah cattle occur have chosen a different breeding strategy, producing animals with different percentages of Zebuine genes, ranging from 1/4 (25%) to 5/8 (62.5%) based on the requirement of particular features, such as better adaptation to humid environments (Agung et al., 2016).
Despite recent advances in exploring BoLA-DRB3 genetic diversity in cattle, a significant amount of breeds and crossbreeds still remain uncharacterized (Takeshima et al., 2003;Giovambattista et al., 2013;Takeshima et al., 2014;Takeshima et al., 2015;Takeshima et al., 2018). The aim of this study, therefore, was to describe for the first time BoLA-DRB3 genetic diversity in the Colombian Simmental breed and its common zebuine cross, Simbrah, comparing it with that of worldwide taurine and zebuine breeds to assess the impact on potential peptide-binding repertoire size and divergence. Such new MHC diversity information will assist in introducing appropriate breeding schemes, guiding further MHC studies.

Study Population and DNA Extraction
Whole blood was collected from the coccygeal or jugular veins of 130 Simmental (N = 67; 5 farms) and Simbrah cattle (N = 60; 5 farms) (Supplementary Data S1), stored in EDTA-containing vacutainer tubes. Bovines were selected from extensive production systems from Colombia's main breeding regions characterized by a reduced number of purebred animals per farm. The herds and purebred animals analyzed were randomly sampled avoiding related individuals. Genomic DNA (gDNA) was extracted using the PureLink Genomic DNA Mini Kit (Invitrogen, Carlsbad, CA, United States) and following the manufacturer's instructions. Previous allelic richness data (Greenbaum et al., 2014) of 14 taurine and zebuine populations from Asia, South America and Europe were included for comparison ( Table 1). This study was carried out following the protocol approved by the Universidad de Ciencias Aplicadas y Ambientales' (U.D.C.A) Animal Research Ethics Committee (minutes No.201901).

DNA Amplification and Sequencing
BoLA-DRB3 exon 2 was amplified with primers DRB3F (5′-TCC CGCATTGGTGGGTGT-3′) and DRB3R (5′-CTCCACACT GGCCGTCCAC-3′) (Ledwidge et al., 2001). The PCR mixture contained 1X Pfx amplification buffer, 300 mM of each dNTP, 0.45 mM of each primer, 1 mM MgSO 4 , 1 U Platinum Pfx DNA Polymerase (Invitrogen) and 50 ng gDNA, in a 50 ml final volume. Two independent reactions were performed for each sample, following previous recommendations to avoid chimeric product formation (Lenz and Becker, 2008). The thermal profile consisted of a denaturation step at 94°C for 5 min followed by 30 cycles of 94°C for 30 s, 64°C for 30 s and 68°C for 1 min, with no final extension. Wizard SV Gel and PCR Clean-Up System (Promega, Madison, WI, United States) were used for purifying the amplicons according to the manufacturer's instructions prior to sequencing both directions using the BigDye Terminator Kit.

Sequence Data Analysis
CLC Main Workbench (CLC bio, Aarhus, Denmark) was used for assembling and editing each sequence independently. Polymorphic positions were recognized for producing a final consensus sequence containing IUPAC ambiguity codes. HAPLOFINDER (Miltiadou et al., 2003) was used for assigning each animal's genotype by comparing the obtained sequences to the BoLA-DRB3 allele sequences reported in the IPD-MHC database (Maccari et al., 2017), as previously indicated (Baxter et al., 2008).

Sequence Diversity, Hardy-Weinberg Equilibrium and Selection Signatures
The number of alleles (N a ) and allele frequencies were manually obtained by direct counting using the maximum likelihood method for estimating standard errors for allele frequencies according to Li (Li, 1976). Nei and Chesser's method (Nei, 1978) was used for calculating the observed heterozygosity (h o ) and ARLEQUIN v.3.5 (Excoffier and Lischer, 2010) for estimating the unbiased expected heterozygosity (h e ). The correlation between sample size and these estimators was used for assessing their dependence. Allele richness (a measure of the average amount of alleles per locus) was also used for comparing the number of alleles found in each population independently from sample size (Greenbaum et al., 2014). The F IS index (Weir and Cockerham, 1984) was estimated for determining potential departures from Hardy-Weinberg equilibrium using the exact test of significance implemented in GENEPOP v.4.5.0 (Rousset, 2008). GENEDOC v.2.7 (Nicholas, 1997) was used for calculating identity and similarity percentages (as assessed by the BLOSUM62 substitution matrix) for all genotypes observed within populations for the whole β1 domain and PBR positions [considering the previously reported 31 putative positions constituting the MHC-DRB PBR (Suárez et al., 2006)]. PBR sequence logos, along with PBR sequence correlation between populations, were analyzed to further evaluate differences in potential MHC-presented peptide repertoire among cattle populations. MEGA X (Kumar et al., 2018) was used for calculating the average amount of synonymous (d S ) and nonsynonymous (d N ) substitutions per site by Nei-Gojobori's method with Jukes-Cantor correction. The Z-test was used for assessing d N /d S ratio significance. Codons subject to positive selection were inferred using maximum likelihood (FEL (Kosakovsky Pond and Frost, 2005)) and Bayesian (MEME (Murrell et al., 2012)) methods using the Datamonkey web server (Delport et al., 2010), using a p-value <0.1 as significant threshold.

Population Structure, Genetic Differentiation and PBR Similarity Correlation
Population structure and genetic differentiations between populations were evaluated by calculating pairwise F ST statistics (Weir and Cockerham, 1984) using ARLEQUIN. POPTREE v.2 (Takezaki et al., 2010) was used for calculating genetic distances (D A ) (Nei, 1978) from allele frequency data and PAST v.3.2 (Hammer et al., 2001) for generating an allele frequency-based principal component analysis (PCA). WebLogo web server (Crooks et al., 2004) was used to construct a logo for PBR positions for each population using BLOSUM62 substitution matrix for each position including all alleles observed within populations. A 620-coordinate vector (31 positions x 20 possible aa genotypes) representing each PBR population was used for calculating Pearson correlation coefficients using R package amap (Ihaka and Gentleman, 1996), while R package ape was used to construct an UPGMA dendrogram (Paradis et al., 2004). Mitochondrial DNA D-loop sequences from individuals of several of the populations analyzed here were recovered from GenBank (Supplementary Data S2) and used for evaluating MHC allele distribution based on genetic
Genetic distance D A clustering was similar to that based on pairwise F ST among Holstein and the Philippine populations ( Figure 1B). The latter group was well-differentiated from Colombian Simmental, Colombian Simbrah, Bolivian Yacumeño and Colombian Normande populations on the basis of D A but not on pairwise F ST . Mean genetic distance values for these breeds indicated that common alleles could explain a large amount of their cumulative allele frequency. Thus, the Holstein population group (0.078 mean D A distance) had 28 alleles in common, accounting for 95.2% of their mean cumulative allele frequency. Ten of these common alleles (BoLA-

DISCUSSION
The MHC influences susceptibility and resistance to infectious diseases, vaccine responses and productions traits. MHC allele distribution information can be used to guide resourceconsumption studies (such as immunopeptidomic or binding affinity assays) aimed at identifying MHC-associated peptides and developing in silico binding predictive algorithms that can be used for understanding and predicting immune response patterns (Nielsen et al., 2018;Rappazzo et al., 2020). MHC high polymorphism can also provide insights into populations evolutionary history. Nevertheless, MHC diversity has only been explored in a few cattle breeds to date (Takeshima et al., 2003;Giovambattista et al., 2013;Takeshima et al., 2014;Takeshima et al., 2015;Takeshima et al., 2018). In this study, we have first characterized BoLA-DRB3 genetic diversity in the taurine Simmental breed and in its most common cross with zebuine cattle in tropical regions, the Simbrah in Colombia. Considering the recent origin of Colombian Simmental and Simbrah cattle, a large percentage of highly related animals is expected since small herds are derived from few parents, thus reflecting a potentially reduced genetic diversity, a condition previously found for other pure cattle breeds in the country (Amaya et al., 2020;Bohórquez et al., 2020).
Cattle populations varied considerably in terms of allele richness and Hardy-Weinberg equilibrium. It is worth noting that unequal sample size have no significant impact on genetic diversity estimates such as h o (unbiased parametric value estimator, mainly determined by the sampling method used (Nei and Chesser, 1983;Bohórquez et al., 2020)), h e or R s , also when the pertinent corrections were applied (Nei, 1978;Leberg, 2002). Simmental and Simbrah R s and h e were among the highest values, similar to those of Normande and Philippine populations. However, differences in R s (or N a ) associated with similar h e values across populations (as observed for Simmental and Simbrah) indicated a marked allele frequency distribution variation, while h o values were the lowest for Colombian cattle, with F IS indices being thereby the highest for these populations. Population sub-structuring may have reduced h o [by means of the Wahlund effect, i.e., reduced observed heterozygosity in a population caused by subpopulation structure (De Meeûs, 2017)], thus magnifying allele frequency differences compared to those found in other populations.
Natural selection and random genetic drift (notoriously exerting higher effects in populations with smaller effective population size) are factors affecting allele frequency distribution (Nei and Tajima, 1981;Tajima and Nei, 1984;Akashi et al., 2012;Husemann et al., 2016), which in the case of the MHC is expected to reflect balancing selection with heterozygote excess (Hughes and Nei, 1988;Hughes and Yeager, 1998;Takeshima et al., 2008). Even though F IS indicated heterozygote deficiency for BoLA-DRB3, the Frontiers in Genetics | www.frontiersin.org February 2022 | Volume 13 | Article 772885 8 factors leading to increased homozygosity (Crow, 2010) as well as small effective population sizes and inbreeding, the use of only one locus hampered from singling out the underlying cause of such pattern. Likewise, the high F IS values for some of these populations, indicating higher homozygote percentages than those expected for Hardy-Weinberg equilibrium, points to the occurrence of evolutionary forces acting on these populations. Therefore, further studies based on genome-wide data including both neutral and non-neutral loci, are necessary to get a comprehensive picture of the evolutionary forces acting on this system, and of their relative contributions [in Simmental, selection based on production and its small effective population size should be taken into account (Amaya et al., 2020;de Araujo Neto et al., 2020)].
Alleles fell into different categories based on their distribution throughout the populations tested. The first category consisted of alleles widely distributed in populations from different continents and often displaying relatively high frequencies. These alleles, such as BoLA-DRB3*011:01, are possibly present in all populations, or absent in a few of them, such as BoLA-DRB3*010:01, 012:01 and 014:01:01. Considering that taurine and zebuine cattle were domesticated in more than two independent events (Beja-Pereira et al., 2006;Decker et al., 2014), these alleles probably predate Bos primigenius divergence which gave rise to these cattle types. Such alleles might have been either present in just a subset of the founder populations or ubiquitous before undergoing secondary loss due to random genetic drift and/or natural selection (alternatively, their very low frequencies impaired their sampling) (Barton, 1996;Sutton et al., 2011). Another category consisted of alleles, such as BoLA-DRB3*017:01, 006:01, 009:01 and 027:10, found predominantly in taurine or zebuine cattle with their presence in the other type of cattle populations being possibly indicative of admixture. The last category includes the alleles found exclusively in some populations and displaying low frequencies, possibly representing the most recently arisen ones (Uinuk-Ool et al., 2002). Despite forming just a moderate proportion of known BoLA-DRB3 alleles found in and Simbrah cattle (67 and 60 out of 330, respectively) (Maccari et al., 2017), they occur with significant frequency in other cattle populations, representing the major allele variants (Bohórquez et al., 2020) (Supplementary Data S3, Supplementary Data S4 and Supplementary Data S6). Nevertheless, some alleles contributing towards the distinction of Colombian Simmental from Simbrah were also significant in differentiating zebuine from taurine cattle, such as BoLA-DRB3*015:01 and 022:01 frequently occurring in taurine and zebuine cattle, respectively (Takeshima et al., 2018). Alleles private to Simmental or Simbrah further contributed to their differentiation.
Although most of the target bovine populations analyzed here were genetically well-differentiated based on BoLA-DRB3, others had a shallow structure due to the sharing of several alleles occurring with high or intermediate frequency. These populations clustered into five groups according to the measures of differentiation used. Low mean distance values indicated high genetic affinity for these populations and more detailed analysis showed that commonly occurring alleles accounted for a large percentage of their mean cumulative allele frequency. Several factors may result in weak structure. For instance, the limited genetic distance between Colombian Simmental and Normande might be due to sample origin, as geographical dispersal patterns in cattle reflects those of exportation and co-migration in humans (Decker et al., 2014), as well as similar selection pressure (Bohórquez et al., 2020). Furthermore, as zebuine introgression occurred independently in American and Indian cattle (Decker et al., 2014), crossbreeding with Brahman may have led to low genetic differentiation between Simbrah and Philippine populations. These results contrasted with the weak differentiation in Holstein cattle, such breed forming a very compact group in spite of multiple sample origins (Takeshima et al., 2015), possibly as the result of intense selective pressure regarding milk production traits (Bohórquez et al., 2020) and a high level of gene flow via genetic improvement strategies, thereby leading to a high degree of homogenization (Spieth, 1974;Leroy et al., 2013).
MHC PBR positions hosted the highest β1 domain variability associated with the peptides to which an allele could bind (Hughes and Nei, 1989;Stern et al., 1994). The similarity/ identity matrix of the MHC-DRB PBR position across cattle populations suggested potential differences in MHC-presented peptide repertoire size ( Table 4). Colombian populations had the highest identity and similarity values in both β1 domain and PBR. This could be due to high overall homozygosity but also to a limited variability at these specific loci and suggested that these animals had smaller individual MHC-presented peptide repertoires. Nevertheless, the good correlation regarding PBR aa sequence and logo analysis may suggest that potential population-related MHC-presented peptide repertoire diversity could be equivalent among all cattle populations analyzed here, with just a few groups displaying a much higher variability. This implies that breeding is unlikely to have decreased functional MHC variability, which bears important implications for peptidebased vaccine design, so that different cattle populations could be targeted using similar peptide combinations. Although it has been shown that decreased MHC variability might be caused by population bottlenecks (Bollmer et al., 2011;Zhang et al., 2016), balancing selection driven by pathogens can still maintain a high degree of diversity (Aguilar et al., 2004;Newhouse and Balakrishnan, 2015). Moutou et al., have shown that functional polymorphism may be lower than genetic polymorphism in pigs (Moutou et al., 2013). The highest similarities within and between porcine populations were mainly due to high correlation in PBR Pocket 1 and 4, whereas a higher divergence was observed for Pocket 6, 7 and 9. This could have arisen from Pocket 1, 4 and 6 aa preferences as anchor positions (Rappazzo et al., 2020). This work has shown that, in spite of high values for some genetic diversity measures regarding BoLA-DRB3 (such as allele richness and expected heterozygosity), Colombian Simmental, Simbrah and other cattle populations may have a limited potential MHCpresented peptide repertoire diversity could be similar among all cattle populations analyzed and that breeding did not decrease functional diversity. Additional analyses directly addressing peptide repertoire diversity are needed to confirm these results.

ETHICS STATEMENT
The animal study was reviewed and approved by the Universidad de Ciencias Aplicadas y Ambientales' (U.D.C.A) Animal Research Ethics Committee (minute No. 201901). Written informed consent was obtained from the owners for the participation of their animals in this study.

FUNDING
This study was funded by the Fundación Instituto de Inmunología de Colombia (FIDIC).