Single-Nucleotide Polymorphism Variations Associated With Specific Genes Putatively Identified Enhanced Genetic Predisposition for 305-Day Milk Yield in the Girolando Crossbreed

Milk production phenotypes are the main focus of genetic selection in dairy herds, and although there are many genes identified as related to the biology of these traits in pure breeds, little is known about crossbreed animals. This study aimed to identify potential genes associated with the 305-day milk yield in 337 crossbreed Gir × Holstein (Girolando) animals. Milk production records were genotyped for 45,613 single-nucleotide polymorphisms (SNPs). This dataset was used for a genome-wide association study (GWAS) using the 305-day milk yield adjusted for the fixed effects of herd and year and linear and quadratic effects of age at calving (in days) and calving factor averaged per animal. Genes within the significant SNPs were retrieved from the Bos taurus ARS-UCD1.2 assembly (bosTau9) for gene ontology analysis. In summary, the GWAS identified 52 SNPs associated [p ≤ 10–4, false discovery rate (FDR) = 8.77%] with milk production, including NUB1 and SLC24A2, which were previously described as related to milk production traits in cattle. The results suggest that SNPs associated mainly with NUB1 and SLC24A2 could be useful to understand milk production in Girolando and used as predictive markers for selecting genetic predisposition for milk yield in Girolando.


INTRODUCTION
In tropical ecozones, the use of crossbred animals or synthetic breeds has become a viable alternative to overcome the numerous challenges of pasture-based dairy production systems. Girolando, a cross between Gir (Bos indicus) and Holstein (Bos taurus), has been selected as a suitable breed for dairy farms due to its high adaptability, milk production, and reproductive efficiency in tropical systems (Reis Filho et al., 2015).
Currently, most of the Brazilian dairy herds consist of Girolando cows with their various genetic groups defined based on pedigree-derived ancestry estimates, especially synthetic groups of 5/8 Holstein/Gir, and the breed accounts for over 80% of the milk production in the country (Canaza-Cayo et al., 2016). The heterosis (hybrid vigor) was important for the formation of Girolando and allowed the qualities present in the two breeds to be fixed in the cross.
Dairy fitness, which in turn results in higher milk production per animal, is a complex quantitative trait varying on a continuous scale. It is a worldwide concept that the large range of variation for quantitative traits is due to the number of genes involved in its expression. Thus, the occurrence of various genotypes within a population would reflect variable phenotypic expression subject to a significant effect of the environment on any individual production (Canaza-Cayo et al., 2018).
In this context, it is possible to carry out genomewide association studies (GWAS) to identify polymorphic chromosomal regions involved in complex traits (Jiang et al., 2019). GWAS allows the identification of DNA markers that strongly correlate to specific traits. DNA markers could be located within gene sequences, revealing candidates potentially involved with the expression of the phenotype of interest. The discovery of markers related to milk production can ensure the accuracy of breeding values and increase the understanding of the genetic control of important traits in economically desired phenotypes (Yin and König, 2019).
Despite the large number of genes previously described as related to the complex trait of milk production in pure breeds, such as Holstein and Jersey cows, very little has been reported in crossbreeds (Canaza-Cayo et al., 2014). Still, the identification of single-nucleotide polymorphism (SNP) affecting milk yield in tropical cattle is of paramount importance to accelerate the rate of genetic change in the dairy industry in developing countries (Daltro et al., 2019). Therefore, the current study aimed to identify SNP markers associated with the 305-day milk yield in Girolando bred and raised tropically in Brazilian dairy farms.

Ethics Statement
The study was conducted with data available with the Empresa Brasileira de Pesquisa Agropecuária-Gado de Leite (EMBRAPA Dairy Cattle), and thus, it is exempt from local ethical committee reviews. The Brazilian Association of Girolando Breeders collected phenotypic data and biological samples for animal genetic evaluation, and their members owned the participating herds and voluntarily consented to have their animals included in the study following best practices of veterinary care.

Single-Nucleotide Polymorphism Genotyping and Quality Control
A total of 337 Girolando cows were genotyped with the Illumina R BovineSNP50 v2 Genotyping BeadChip assay according to the manufacturer's protocol. Individuals that presented a call rate lower than 90% or that shared identically-by-state over 95% of their alleles with another sample were excluded. Only autosomal SNPs with unique genomic coordinates were analyzed, and markers were removed from the dataset if they did not present a minimum call rate of 90%, minor allele frequency of at least 2%, and Fisher's exact test p value for Hardy-Weinberg equilibrium greater than 1 × 10 −20 . These procedures were performed using customized scripts in R v.3.1.1 (available at 1 ) and the GenABEL v1.8 package (Aulchenko et al., 2007).

Phenotypic Data Collection and Milk Yield Adjustments
Records of 305-day milk yield (in kg) were available for the genotyped animals, comprising 832 observations. All records were pre-filtered for a minimum lactation period of 30 days and physiological interruption of lactation. First-lactation cows and records of interrupted lactations due to non-physiological reasons (death, loss of the quarter) were excluded from the data. The phenotypic data were adjusted for the fixed effects of herd and year and linear and quadratic effects of age at calving (in days) and days in milk. The total lactation (kg) was adjusted to 305 days by the following formula: (lactation length/305) × milk yield (Associação Brasileira de Criadores de Bovinos da Raça Holandesa [ABCBRH], 1986). The response variable used for associations was composed of the mean of all 305-day milk yield per cow adjusted for the aforementioned environmental effects. The average number of lactations observed per animal was 2.5 ± 1.5, with an interval from 2 to 9 lactations/animal, age at first calving (days) of 1,874.0 ± 758.7, and lactation length (days) of 254.3 ± 70.6.

Genome-Wide Association Analysis
In order to map putative quantitative trait loci (QTLs), associations between markers and phenotypes were tested using the following single-marker linear regression model: where y is the column vector of phenotypes, 1 n is a vector of 1 s, µ is the overall mean, b is a column vector of unobserved allele substitution effects, x is a vector of genotypes (coded as 0, 1, or 2 Illumina B alleles) relating observations in y to vector b, and e is the column vector of random residual effects, assumed e ∼ N 0, σ 2 e . This model was fitted using the ordinary least squares method. Markers were prioritized for functional annotation conditional on suggestive or significant values of p < 10 −4 and p < 10 −5 , respectively. The expected false discovery rate (FDR) was computed as: where k is the total number of markers being tested, α is the significance level adopted, and s is the number of markers declared significant at α. These procedures were performed using customized scripts in R v.3.1.1 [available at (see text footnote 1)] and the GenABEL v1.8 package [1].

Partitioning of Marked Variance
Ideally, the variance explained by causal quantitative trait nucleotides (QTNs) can be estimated via the linear model: x i b i is the vector of random breeding values. These values represent the sum of the effects of all inherited QTNs and are assumed g ∼ N 0, Kσ 2 g , where K is the kernel matrix of additive genetic relationships between pairs of individuals at QTNs, and σ 2 g is the variance component attributed to the causal variants (i.e., additive genetic variance). An alternative parameterization of this model under the Reproducing Kernel Hilbert Spaces (RKHS) framework considers the change of variable g = Kc, which results in c ∼ N 0, K −1 σ 2 c , where variance components σ 2 g and σ 2 c are interchangeable. As QTNs are not known in practice, genome-wide markers are often used as proxies to indirectly capture the effects of causal loci, and matrix K is computed from high-density SNP data. Different relationship matrices based on specific SNP sets, such as markers clustered by chromosomes, can also be computed to partition the additive genetic variance onto variance due to groups of markers [3]. Here, in order to estimate the variance due to putative major QTLs and remaining polygenic effects, we partitioned the additive genetic variance between genome-wide significant and non-significant markers by fitting, as described by Colli et al. (2018).
where subscripts QTL and G represent significant and nonsignificant genome-wide SNPs. Model parameters were estimated using the Gibbs sampling algorithm implemented in the BGLR v1.0.3 package in R [4]. Normal priors were assigned to random effects, and a flat prior was assigned to the overall mean. Variance components were assumed a priori scaled inverse chi-square distributed with ν = 5 degrees of freedom and scale parameter S = var y 1 − R 2 (ν + 2), where R 2 is the proportion of phenotypic variance a priori assigned to the random effects.
We assumed R 2 = 0.5 for the residual variance, and the unexplained variance was equally assigned to the remaining variance components. A single Markov chain with a length of 100,000 iterations was used. The burn-in period was set at 10,000 iterations and the thinning interval at 100 iterations.
For Gene Ontology analysis, genes within significant SNPs were considered; if the SNP was located in the intergenic region (i.e., not assigned to any gene), we selected the closest gene from the marker according to the ARS-UCD1.2 reference genome (Lehne et al., 2011;Buzanskas et al., 2017;Martínez et al., 2017;Cai et al., 2019). Gene names and coordinates were retrieved from the Ensembl Genes 101 database using the BioMart tool. SNP probe coordinates were migrated from UMD 3.1 (bosTau6) to ARS-UCD1.2 (bosTau9) assembly using UCSC liftOver 2 .

Genotype Quality Control and Phenotype Adjustment
From an initial set of 404 cows, 54,609 SNPs were collected. However, at best, 337 animals and 45,622 markers passed all filtering criteria. The average (± SD) number of parities observed per animal, age at calving (in days), and days in milk were 2.47 ± 1.47, 1,874.0 ± 758.72, and 254.3 ± 70.6, respectively. The averages (± SD) for minimum and maximum milk yields were 4,563.0 ± 2,280.83 kg and 316.0 ± 13.88 kg, respectively. Figure 1 shows the results of the single-marker linear regression analysis (p = 10 −4 and FDR 8.77%), indicating 52 significant SNPs. When the threshold was set to p = 10 −5 , seven significant markers remained (FDR 6.52%). Significant associations were detected in 18 out of the 29 bovine autosomes. Thirty-five of the 52 SNPs identified via the GWAS in Girolando cows are associated with previously described genes of the bovine genome (FDR 8.77% and  Table 2 presents the variance component estimates of the models, partitioning the additive genetic variance between markers declared significant at different levels (10 −4 and 10 −5 ) and the remaining markers. The top-scoring seven SNPs explained 14.4% of the adjusted phenotypes alone, whereas all 52 top-scoring SNPs together explained 28.7% of the variance in the adjusted phenotypes. Altogether, genome-wide SNPs could explain as much as 49% of the phenotypic variance.

DISCUSSION
Considering that bovine parity is a quantitative trait, it is very likely that the genes identified in association with 305-day milk yield in Girolando account for a small fraction of the total genetic variance for milk yield. Nevertheless, the findings reported here are essential in the context of the biology and functional characterization of milk production in Girolando, the most common industrial dairy crossbreed in Brazil. The SLC24A2 gene encodes member 2 of the solute carrier family 24 of transporter proteins. The SLC24A family comprises six mammalian protein subtypes that exchange cellular calcium and potassium with extracellular sodium (Altimimi et al., 2010). In humans, multiple variants of SLC24A2 have been described. The protein is considered to have a neuroprotective role and is found primarily in the retina and brain (Cuomo et al., 2007), additionally, in cattle, the process of sodium:calcium exchange in the photoreceptors was described in 1986 (Schnetkamp, 1986). However, there is no further information regarding the genetic role of SLC24A2 in milk production. Recent studies have demonstrated that the genes of the SLC24A family extrude calcium and potassium ions out of the cell by the entry of sodium ions (Altimimi et al., 2010). This process was also demonstrated in the uterine endometrium during the estrous cycle and pregnancy in pigs (Choi et al., 2014). In our study, two SNPs were linked to the SLC24A2 gene, which was significantly associated with milk production in the Girolando crossbreed (Table 1).
In humans and bovines, the NUB1 gene encodes a crucial ubiquitin-like protein that operates in cell cycle progression for tissue maintenance (Akey et al., 2002). The NEDD8 plays a vital role in cell cycle control and embryogenesis, and the AIPL1 is a chaperone controlling nuclear transport activity (Arcot et al., 2015;Lucariello et al., 2013). The gene NUB1 interacts primarily with multiple potential ubiquitin genes, and it has been demonstrated that NEDD8 is expressed in bovine milk somatic cells and that its expression increases gradually throughout the lactation cycle, doubling its expression from early to late lactation in Holstein cows (Wickramasinghe et al., 2012). Thus, NUB1 may play an essential role in controlling milk yield in cattle.
One relevant observation is that both NUB1 and SLC24A2 are primarily expressed within the photoreceptors of the retina (Akey et al., 2002). The interactions established by the proteins encoded by these two genes are essential for capturing light through the retina, and their role is dependent on their level of expression (Kirschman et al., 2010). According to Dawkins (2006), in the tropics, the light intensity remains relatively constant throughout the year but varies in temperate climates according to seasonal changes. In cattle, the response to light stimuli occurs through the hypothalamus-pituitary axis, and the eyes are the primary recipients of brightness (Valenzuela-Jiménez et al., 2015). Several authors (Yaegashi et al., 2012;Tsang et al., 2014) suggest that the manipulation of the photoperiod could be a useful variable for increasing milk production in cattle as it affects prolactin secretion in mammals. Valenzuela-Jiménez et al. (2015) found an increase of approximately 15% in the lactation of Holstein cows raised in a controlled environment with average daylight of 16 h when compared to an environment with a shorter photoperiod of 8 h. Likewise, Dahl et al. (2012) found a higher growth rate and increased milk production in the first lactation of heifers exposed to longer photoperiods.
The aforementioned reports support the role of NUB1 as a potential functional candidate gene for increased milk production in cattle, both by acting as a modulator of brightness perception in the retina and by controlling gene expression owing to ubiquitination. The SLC24A2 gene, also expressed in the retina, may be involved in light capture mechanisms, thereby influencing milk yield. Both mechanisms have been well-investigated in Holstein cows, a breed that contributes a substantial part of the Girolando's genomes. Thus, artificial selection may have indirectly contributed to the role of NUB1 and SLC24A2 in modulating milk production and yield in Brazilian climates, characterized by longer photoperiods throughout the year.
The MARCHF10 gene is a member of the MARCH family of membrane-bound E3 ubiquitin ligases to target lysines in substrate proteins and participates in signaling vesicular transport between membrane compartments (Morokuma et al., 2007). The KHDRBS3 has a role in blood-tumor barrier (BTB) that severely restricts the efficient delivery of antitumor drugs to cranial glioma tissues (Wu et al., 2019). The functional effects of MARCHF10 and KHDRBS3 have been described in humans, with no evidence of functions for cattle lactogenesis.
The 52 SNPs identified in the current GWAS with an FDR of 8.77% for the threshold at 10 −4 were linked to several genes that were not previously associated with milk production, and their cellular roles vary from DNA binding to transcription factors that regulate cell growth and proliferation (Poitras et al., 2008;Huang et al., 2011;Aslibekyan et al., 2012;Xiao et al., 2012;Lemos et al., 2016;Grigoletto et al., 2020), and an investigation of the effect of the beef fatty acid profile detected in Angus in the 778 K analysis revealed at least two biologically relevant genes, namely, KHDRBS and FAM135 (Table 1), previously associated with livestock feed efficiency, residual average daily gain, and adipogenesis (Seabury et al., 2017). The putative functional role of these genes in milk production, if any, remains to be identified. If identified, the GWAS results indicate the potential for SNP-based genomic selection for genetic improvement of Girolando crossbred cattle.
Our data suggest that NUB1 and SLC24A2 are important genes for understanding milk production in Girolando and lay a preliminary foundation for designing future follow-up studies regarding this trait in the crossbreed. In addition, the identified SNPs could be used as potential markers to putatively identify enhanced genetic predisposition for milk yield in the most common industrial dairy crossbreed in Brazil.

DATA AVAILABILITY STATEMENT
The data analyzed in this study was obtained from an agreement between Embrapa (Empresa Brasileira de Pesquisa Agropecuária), CRV, and Zoetis. Thus, legal and privacy restrictions prevent data from becoming publicly available. However, requests to access these datasets should be directed to Dr. MS at marcos.vb.silva@embrapa.br.

ETHICS STATEMENT
Ethical review and approval was not required for the animal study because the present study was exempt of the local ethical committee evaluation as genomic DNA was extracted from stored hair of animals from commercial herds. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
ACr, DS, and ACa wrote the manuscript. ApC, CS, MS, and JG conceived the study. YU, ACa, and MS contributed to data production and quality control. FR, DS, and YU performed data analysis. LM and LF contributed to the interpretation of the results. ApC and MS provided samples or funded part of the analyses. All authors read, made corrections, contributed, and approved the manuscript.

FUNDING
This work was supported by the Goiás Research Foundation (FAPEG). Financial support for the publication of this manuscript has been provided by the Pontifical Catholic University of Goiás within its activities of promotion and dissemination scientific research.