Genomic prediction based on a joint reference population for the Xinjiang Brown cattle

Introduction: Xinjiang Brown cattle constitute the largest breed of cattle in Xinjiang. Therefore, it is crucial to establish a genomic evaluation system, especially for those with low levels of breed improvement. Methods: This study aimed to establish a cross breed joint reference population by analyzing the genetic structure of 485 Xinjiang Brown cattle and 2,633 Chinese Holstein cattle (Illumina GeneSeek GGP bovine 150 K chip). The Bayes method single-step genome-wide best linear unbiased prediction was used to conduct a genomic evaluation of the joint reference population for the milk traits of Xinjiang Brown cattle. The reference population of Chinese Holstein cattle was randomly divided into groups to construct the joint reference population. By comparing the prediction accuracy, estimation bias, and inflation coefficient of the validation population, the optimal number of joint reference populations was determined. Results and Discussion: The results indicated a distinct genetic structure difference between the two breeds of adult cows, and both breeds should be considered when constructing multi-breed joint reference and validation populations. The reliability range of genome prediction of milk traits in the joint reference population was 0.142–0.465. Initially, it was determined that the inclusion of 600 and 900 Chinese Holstein cattle in the joint reference population positively impacted the genomic prediction of Xinjiang Brown cattle to certain extent. It was feasible to incorporate the Chinese Holstein into Xinjiang Brown cattle population to form a joint reference population for multi-breed genomic evaluation. However, for different Xinjiang Brown cattle populations, a fixed number of Chinese Holstein cattle cannot be directly added during multi-breed genomic selection. Pre-evaluation analysis based on the genetic structure, kinship, and other factors of the current population is required to ensure the authenticity and reliability of genomic predictions and improve estimation accuracy.


Introduction
Xinjiang Brown cattle is a major breed supporting the development of the cattle industry in Xinjiang, it was the first breed of cattle used for milk and meat purposes after the founding of the People's Republic of China.In 2023, the number of Xinjiang Brown cattle in stock reached 1.16 million; however, the level of breed improvement was low, with a performance measurement population of <10,000.Therefore, it is important to establish an efficient genomic evaluation system for Xinjiang Brown cattle to improve their genetic level.The application of genome selection technology has significantly enhanced the efficiency of genomic evaluation (Hayes et al., 2016).Because of the implementation of the genome selection for Chinese Holstein cattle in 2008, early and accurate selection of calves and young cattle has been achieved (George et al., 2017), leading to higher accuracy in genomic evaluation and more precise assessment of individual breeding value (Weigel et al., 2010;Dassonneville et al., 2011).In addition, due to early selection and higher accuracy, the rate of genetic progress has doubled (Weller et al., 2017), improving breeding profitability and significantly reducing breeding costs.Although genome selection has been successfully applied to Chinese Holstein cattle population, the low level of production performance measurement and small population size of Xinjiang Brown cattle have hindered the application of genome selection technology.To improve the reliability of genomic predictions, especially for smaller populations, many feasible methods have been proposed, including increasing marker density, constructing linkage disquilibrium (LD) with more markers and causal mutations, and simulation data analysis (de Roos et al., 2009).Simulation and real data analyses (BrØndum et al., 2015) have shown that genomic prediction can play an important role in different populations.
For genome selection, it is necessary to have a reference population with sufficient size and an appropriate genetic structure that simultaneously incorporates genomic and phenotypic information to accurately predict genome estimated breeding values (GEBVs) (Metta et al., 2004;Boichard et al., 2016).Genome selection has recently been widely used in dairy cattle breeding programs.However, its application is limited to populations with a small number of breeds.Establishing a sufficiently large reference population is the most limiting factor for the accurate estimation of SNP effects (Boichard et al., 2016).When conducting genome selection for small populations, the most direct approach to enhancing its reliability is to expand the reference population.Many countries have found effective solutions through international cooperation, leading to joint genomic evaluations (Lund et al., 2011).By connecting France, Germany, Austria, Italy, Slovenia, Switzerland, and the United States of America to the InterGenomics consortium operated by the Interbull Center (Zumbach et al., 2010;Jorjani et al., 2012), genome-wide joint evaluations have been conducted for Brown Swiss bulls and Simmental cattle in Germany and Austria (Edel et al., 2011).Research has shown that by combining different populations of the same breed or related breeds in the reference population, more effective information can be obtained for estimating marker effects.Therefore, more accurate breeding predictions can be obtained from genomic predictions.Accuracy is improved when three related dairy cattle populations, Danish Red, Swedish Red, and Finnish Ayrshire, are combined into a single reference population (Zhou et al., 2014).When four European Holstein populations were combined into a reference population, the reliability increased by 10% (Lund et al., 2011).By combining six Brown Swiss populations, the reliability increased from 6% to 45% (Jorjani et al., 2012).However, multi-breed genomic evaluation of Xinjiang Brown cattle has not yet been conducted, limiting the optimized utilization of genomic selection technology in their genomic evaluation.
Based on the research foundation for domestic and international multi-breed joint genomic evaluation (Pryce et al., 2011;Lund et al., 2014;Steyn et al., 2019;Xu et al., 2019;Palombo et al., 2021), we proposed to integrate Xinjiang Brown cattle and Chinese Holstein cattle to construct a joint reference population for genome selection.In order to expand the Xinjiang brown cattle genome selection reference group, so as to apply multi-breed genome selection in Xinjiang brown cattle population to improve the prediction reliability.This study aimed to analyze the genetic structures of Xinjiang Brown and Chinese Holstein cattle to establish a multibreed joint reference population.Using a dual-trait single-step genome-wide best linear unbiased prediction (ssGBLUP) approach, we established a genomic evaluation system for the primary lactation traits of Xinjiang Brown cattle, leveraging the joint reference population of Xinjiang Brown and Chinese Holstein cattle.This improves the accuracy of genomic selection for Xinjiang Brown cattle, creating a core breeding herd of genetically superior dairy Xinjiang Brown cows.Consequently, the genetic improvement of Xinjiang Brown cattle population will be expedited, leading to enhanced genetic levels across the breed.

Sample collection and DNA extraction
A total of 1,729 blood samples were collected from the tail vein of Xinjiang Brown cattle and added to 10 mL EDTA anticoagulant tubes.The samples were then aliquoted into 1.5 mL centrifuge tubes and stored at −20 °C.In addition, 66 frozen semen samples were collected from Xinjiang Brown and Brown Swiss bulls used for the artificial insemination of Xinjiang Brown cattle after 1983.
DNA was extracted from the above samples, and the concentration and purity of the obtained genomic DNA were measured using a NanoDrop 2000c spectrophotometer.The OD260/OD280 ratio was 1.7-1.9,indicating good DNA quality.After assessing DNA concentration, purity, and integrity, the samples were stored at −20 °C (Ma, 2015).Beijing, distributed across 18 farms in the region.All of these animals were detected using the Illumina GeneSeek GGP bovine 150 K chip.

Chip imputation and quality control
A total of 139,376 and 138,892 SNP markers were detected using Xinjiang Brown and Chinese Holstein cattle chip assays, respectively.These data were imputed using Beagle 4.1 software, which infers haplotypes present in the population based on the principle of linkage disequilibrium.To ensure the accuracy of imputation, quality control measures were applied to the chip data.
The quality control criteria were as follows: individuals with a genotyping call rate of <90% were excluded.Only SNPs on chromosomes 1-30 were retained, with an individual genotype missing rate of <10%.SNPs with a minor allele frequency of >0.01 and a Hardy-Weinberg equilibrium p-value >1 × 10 −6 were also included.After quality control using the PLINK software, the SNP genotypes were converted to a 0, 1, and 2 format.Finally, 118,622 and 123,268 SNP markers on the autosomal chromosomes of Xinjiang Brown and Chinese Holstein cattle were retained, respectively.Because the number of SNP markers differed between the two breeds after quality control, an intersection of the SNP markers was taken, which resulted in 118,021 common SNP markers for both breeds (Figure 1).

Linkage disequilibrium analysis
The.map and.ped files for both breeds were converted to.vcf format using PLINK.The PopLDdecay software was then used to analyze and plot LD decay graphs (https://github.com/BGIshenzhen/PopLDdecay)(Zhang et al., 2019).The LD metric, r 2 , was calculated for the four populations (Hill, 1974).The mean r 2 value was computed at various marker distances of 1 Kb to demonstrate the degree of LD decay across different populations.

Population structure analysis
To infer ancestral populations based on the allele frequencies of descendant individuals, an unsupervised algorithm was employed (Consortium et al., 2009).In this study, genome-wide SNP data were used to calculate the population structure for ancestral admixture components with K values of 2-4 using admixture (Alexander et al., 2009).Visualization of the population structure was performed using the R package "pophelper".
Analysis was conducted using the FastTree software (http:// www.microbesonline.org/fasttree/),with the maximum likelihood method adopted for estimation.The Jukes-Cantor + CAT model was used as the default model for nucleotide phylogeny.The credibility of the phylogenetic tree branches was tested using 1,000 bootstrap replicates.Finally, the FigTree software was used for visualization.

Principal component analysis
The Gmatrix package in R was used to calculate the genomic kinship matrix (G-matrix) for Xinjiang Brown and Chinese Holstein cattle.Subsequently, principal component analysis (PCA) was performed using the G-matrix.The first three eigenvectors (PCA1, PCA2, and PCA3) were extracted and used as the horizontal and vertical coordinates for plotting.The contribution rates of the principal components were calculated on the basis of the percentage of eigenvalues.Finally, visualization was performed using the R language.
2.5 Multi-breed genomic evaluation using a joint reference population

Phenotypic data processing
The data for Xinjiang Brown cattle include production performance measurement records from 1983 to 2018 and DHI measurement records from 2010 to 2017.The data for Chinese Holstein cattle include DHI measurement records from 2001 to 2019.Milk-related traits, including 305-day milk yield (305dMY), milk fat yield (MFY), milk protein yield (MPY), and somatic cell score (SCS), were obtained through collation (Table 1).There were 7,516 and 93,717 milk trait measurements recorded for Xinjiang Brown and Chinese Holstein cattle, respectively.
The pedigree file used to analyze Xinjiang Brown cattle had 16,795 cattle, including 676 breeding bulls.Among these bulls, one had a maximum of 619 offspring, whereas 221 had only one offspring.Among the female adult cattle, 583 had only one offspring, whereas 1,623 had two or more offspring, with a maximum of 12 offspring per individual.
For the Chinese Holstein cattle, the pedigree file used for the analysis contained 6,54,390 individuals, including 11,243 breeding bulls.Among these bulls, one had a maximum of 7,884 offspring, whereas 4,695 had only one offspring.Among the female adult cattle, 1,63,781 had only one offspring, whereas 1,11,912 had two or more offspring, with a maximum of 12 offspring per individual (Table 2).

Genotype data
Genotype data for 403 female Xinjiang Brown cattle, 71 male Xinjiang Brown cattle, and 11 male Brown Swiss cattle was considered.In addition, 2,100 Chinese Holstein cattle were randomly selected (According to PCA and Admixture results, PLINK software was used to remove the chip data of Chinese Holstein cows that was inconsistent with the large population of Chinese Holstein cows).

Statistical analysis
In this study, the ssGBLUP method was used to construct the H-matrix based on the pedigree and genomic information from Xinjiang Brown and Chinese Holstein cattle.The two-trait model Bayesian approach was used to estimate the variance components and breeding values for each trait.
To investigate the suitable integral ratio of the Chinese Holstein cattle in the joint reference population, a random gradient grouping approach was applied.The population was gradually accumulated in increments of 300 individuals to construct the joint reference population.A control group was established by excluding the phenotypic and genomic information of the Chinese Holstein cattle (Table 3).
Because of the significant differences in milk production traits between Xinjiang Brown and Chinese Holstein cattle (Zhang et al., 2021), a dual-trait animal model was constructed.In this model, each biological trait was treated individually in the two populations, accounting for potential scale inconsistencies that may arise during breeding value estimation due to standardization across different breeds.The milk production traits (305dMY, MFY, MPY, and SCS) of Xinjiang Brown and Chinese Holstein cattle were considered to be two separate traits.A dual-trait linear model was used to estimate the variance components for milk production traits based on the genomic-pedigree combined relationship matrix, H-matrix.The model is described as follows: In the formula, y 1 represents the observation value vector of a certain milk trait for each of Xinjiang Brown cattle, and y 2   represents Chinese Holstein cattle.β 1 and β 2 represent the fixed effect vectors for the same milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively, including farm effect, calving year effect, calving season effect, and parity effect.The farm effect was divided into 20 levels based on the phenotypic data sources of the farms where the cattle were raised.The calving years of Xinjiang Brown cattle were divided into seven levels based on phenotypic records: 1985-1995, 1996-2000, 2001-2005, 2006-2008, 2009-2011, 2012-2014, and 2015-2018.The calving years of the Chinese Holstein cattle were divided into six levels: 2001-2005, 2006-2008, 2009-2011, 2012-2014, 2015-2018, and 2019.For the parity effect, Xinjiang Brown and Chinese Holstein cattle were classified into six levels: 1, 2, 3, 4, 5, and 6 (including those with more than six calves).The calving season effect of Xinjiang Brown cattle was divided based on the unique climatic conditions of Xinjiang.According to the method of temperature intervals, April and May were considered spring, June, July, and August were considered summer, September was considered autumn, and January, February, March, October, November, and December were considered winter.Conversely, the calving season effect of Chinese Holstein cattle was determined based on the climatic characteristics of Beijing.Following the same method of temperature intervals, March, April, and May were considered spring, June, July, and August were considered summer, September, October, and November were considered autumn, and December, January, and February were considered winter.a 1 and a 2 represent the individual additive genetic effect vectors for a certain milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively.e 1 and e 2 represent the random residual effect vectors for the same milk trait of Xinjiang Brown and Chinese Holstein cattle, respectively.X i and Z i represent the incidence matrices for the fixed effects and individual random additive genetic effects of the i-th trait, respectively.Assume , In the aforementioned formula, H represents the combined genomic-pedigree relationship matrix.σ 2 αi denotes the additive genetic variance for the i-th breed, while σ α1α2 represents the covariance between breeds.σ 2 ei stands for the residual variance of the i-th breed.Given that Xinjiang Brown and Chinese Holstein cattle are reared in separate populations, there is no residual covariance between the two groups.
The genetic variance-covariance structure of the ssGBLUP additive genetic effect model is represented by a ~N(0, Hσ 2 a ), where σ 2 a denotes the additive genetic variance.H, the pedigree-genome relationship matrix, represents a combination of the pedigree-based additive genetic relationship matrix (A matrix) and the genome-based kinship matrix (G-matrix) (Aguilar et al., 2010;Christensen and Lund, 2010).
The formula used to compute H is as follows: Subscripts 1 and 2 in A represent the non-genotyped and genotyped animals in the population, respectively.G represents the genetic relationship matrix.The calculation formula is as follows: . M represents the association matrix for SNP effects, where the elements (0 − 2p j ), (1 − 2p j ), and (2 − 2p j ) represent homozygous 11, heterozygous 12 or 21, and homozygous 22 genotypes, respectively.p j represents the minor allele frequency of the jth SNP, and m represents the number of markers.p k denotes the allele frequency of the Kth SNP.Therefore, the H −1 formula is given by H

22
: where A −1 represents the inverse matrix of all pedigree relationships, G −1 represents the inverse matrix of genomic kinship relationships, and A −1 22 represents the inverse matrix of the pedigree relationships for the sequenced individuals.The Bayes method was calculated using the GIBBS1F90 module in the BLUPF90 software along with the Bayes-Gibbs sampling method.In the Bayes method, the total chain length of the samples was 100,000, the burn-in chain length was 10,000, and the thinning interval was 50.The Geweke diagnostic method in POSTGIBBSF90 was used to check the convergence of the Gibbs chain (Zhang et al., 2022).

Calculation of heritability
The calculation formula of heritability is as follows: The formula for the standard error of heritability is shown below: where h 2 is heritability, SE 2 (h 2 ) is the standard error of heritability, σ 2 a is the additive genetic variance, σ 2 e is the residual variance.σ 2 p is the overall phenotypic variance, σ 2 p σ 2 a + σ 2 e .

Verification of the reliability of genomic breeding values
To verify the accuracy of the estimated genomic breeding values for the joint reference populations, 50 offspring individuals born in the past 4 years from 485 genotyped Xinjiang Brown cattle served as a validation group.Genomic predictions were performed in two groups: with and without excluding the phenotypic data of the validation group.This resulted in 16 sets of genetic parameters and genomic estimated breeding values for each milk trait.By comparing the prediction accuracy, estimation bias, and inflation coefficient of the validation group, the optimal number of joint reference populations was determined.
To calculate the prediction accuracy of the genomic estimated breeding values, the correlation coefficient between the genomic breeding values calculated with the phenotypic data of the validation group and those calculated without these data was used to measure the accuracy of estimating genomic breeding values for different joint reference populations.The formula is as follows: R GEBV Cor(TBV*, GEBV), where R 2 represents reliability or the square of accuracy.
Meanwhile, the regression of TBV* on GEBV is calculated using the formula TBV* b 0 + b 1 GEBV, where the regression coefficient b 1 is the inflation coefficient, and the intercept b 0 is the estimation bias (Legarra and Reverter, 2019).In this context, b 1 represents the regression coefficient, which has the following implications: when b 1 < 1, GEBV is inflated, indicating that Var (GEBV) is too large.This means that in the genomic breeding values, the good ones are even better, and the bad ones are even worse.Conversely, when b 1 > 1, GEBV is deflated, suggesting that Var (GEBV) is too small.This indicates that the genomic breeding values are smaller than the true values and are contracted toward the middle.

Results
3.1 Genetic structure

Linkage disequilibrium analysis
Linkage disequilibrium analysis was conducted between the two breeds by calculating the linkage disequilibrium coefficients for the two loci and plotting the LD decay graph (Figure 2).The graph shows that the average LD coefficients for Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows at a genomic distance of 50 kb were approximately 0.2, 0.25, 0.3, and 0.35, respectively, indicating a gradual increase.Noteworthy, the decay rates of the LD coefficients vary among different populations.Among the breeds, Brown Swiss bulls exhibited the slowest LD decay at 0-40 kb, whereas the Chinese Holstein cows exhibited the fastest LD decay.However, in the range of 40-300 kb, Xinjiang Brown cows exhibited the fastest LD decay, with a decay rate order of Xinjiang Brown cows > Chinese Holstein cows > Xinjiang Brown bulls > Brown Swiss bulls.

Population structure analysis
To further investigate the genetic components of Xinjiang Brown and Chinese Holstein cows, population structure and phylogenetic tree analyses were conducted.As shown in Figure 3, when the number of ancestral populations K = 2, there was a clear distinction in the genetic structure among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows.However, the genetic structure within each group differed insignificantly.As shown in Figure 4, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls were clustered, whereas the Chinese Holstein cows were clustered separately.In addition, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls appear at the end of a certain branch within the Chinese Holstein population.

Genetic relatedness between Xinjiang Brown and Chinese Holstein cows
Using the SNP genotyping information from 403 Xinjiang Brown cows, 71 Xinjiang Brown bulls, 11 Brown Swiss bulls, and  2,633 Chinese Holstein cows, a G-matrix was constructed.Figure 5 was then generated on the basis of the actual genetic relatedness among individuals in the G-matrix.Figure 5 shows that the kinship coefficients among Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls populations were approximately 0.5, which is significantly higher than those among individuals within the Chinese Holstein cow population.The kinship coefficients between Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows populations tend toward 0.

PCA
PCA was performed using the genomic kinship relationship matrix (G-matrix) among individuals from the two breeds (Figures 2-7).The results revealed that the first principal component (PC1, accounting for 4.75%) separated Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows into distinct groups.Specifically, Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls were closely clustered.The second principal component (PC2, accounting for 1.76%) could not distinguish between Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls; however, it separated the Chinese Holstein cows into two distinct groups.The third principal component (PC3, accounting for 1.22%) further distinguished the Chinese Holstein cows into two groups.

Random grouping of the joint reference group
The chip data of 2,633 Chinese Holstein cows were screened to eliminate individuals with distant kinship within the Chinese Holstein cattle population, leaving 2,271 genotyped Chinese Holstein cows.Among these, 2,100 cows were randomly selected as the total population of Chinese Holstein cows to be included in the joint reference group.Random equal-sized groupings were then performed on 2,100 genotyped Chinese Holstein cows, resulting in subsets with different numbers of cows.As shown in Figure 7, the distribution of the subsets in the total population was relatively scattered for the Chinese Holstein cows added to the joint reference group.

Estimation of genetic parameters of dairy traits in the joint reference group
As shown in Table 5, before the inclusion of the Chinese Holstein population, the heritability of 305dMY was 0.204 without excluding the phenotypic data of 50 genotype Xinjiang Brown cows, which decreased to 0.203 after data exclusion.When varying numbers of Chinese Holstein cows were incorporated into the joint reference group, the heritability of  305dMY in Xinjiang Brown cows was 0.137-0.249without excluding the phenotypic data of the 50 genotype cows.However, after excluding these data, the heritability of 305dMY in Xinjiang Brown cows was adjusted to 0.169-0.254.
As shown in Table 6, without the inclusion of Chinese Holstein cows, the heritability of MFY, was 0.07 when the phenotypic data of 50 genotype Xinjiang Brown cows were included, and it was 0.073 when the phenotypic data were excluded.When different numbers of Chinese Holstein cows were added to the reference population, the MFY, heritability of Xinjiang Brown cows was 0.073-0.086when the phenotypic data of 50 genotype Xinjiang Brown cows were included, and it was 0.057-0.085when the phenotypic data were excluded.
As shown in Table 7, without the inclusion of the Chinese Holstein cows, the heritability of MPY, was 0.143 when the phenotypic data of 50 genotyped Xinjiang Brown cows were included, and it was 0.145 when the phenotypic data were excluded.When different numbers of Chinese Holstein cows were added to the reference population, the MPY, heritability of Xinjiang Brown cows was 0.123-0.158when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included, and it was 0.0142-0.174when the phenotypic data were excluded.
Table 8 shows that without the inclusion of the Chinese Holstein cow population, the heritability of SCS was 0.042 when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included and 0.043 when the phenotypic data were excluded.After adding different numbers of the Chinese Holstein cows to the joint reference population, the SCS heritability for Xinjiang Brown cows was 0.02-0.062when the phenotypic data of the 50 genotyped Xinjiang Brown cows were included.When the phenotypic data were excluded, the SCS heritability for Xinjiang Brown cows was 0.015-0.081.Gradient grouping of PCA analysis in joint reference group.

Verification of the genomic breeding value reliability
As shown in Table 9, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for 305dMY of Xinjiang Brown cows was 0.142-0.340,with a regression coefficient of 0.129-0.312.The reliability of the genomic breeding values for the validation population was −0.033-0.087,with a regression coefficient of −0.064-0.056.
As shown in Table 10, when different numbers of the Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for MFY, of Xinjiang Brown cows was 0.263-0.424,with a regression coefficient of 0.296-0.437.The reliability of the genomic breeding values for the validation population was −0.149-0.138,with a regression coefficient of −0.192-0.137.
As shown in Table 11, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for MPY, of Xinjiang Brown cows was 0.28-0.465,with a regression coefficient of 0.277-0.504.The reliability of the genomic breeding values for the validation population was −0.259-0.203,with a regression coefficient of −0.213-0.032.
As shown in Table 12, when different numbers of Chinese Holstein cows were added to the joint reference population, the reliability of the total population genomic breeding values for SCS, of Xinjiang Brown cows was 0.190-0.448,with a regression coefficient of 0.207-0.502.The reliability of the genomic breeding values for the validation population was −0.145-0.062,with a regression coefficient of −0.223-0.075.

Genetic structure analysis
LD is a metric that quantifies whether genotype variations in two SNP markers are relatively consistent and whether they are correlated (Park, 2012).If two loci with adjacent alleles are correlated, certain genotypes tend to be co-inherited, resulting in a higher frequency of certain haplotypes than expected.This pattern can be visually represented using an LD decay plot (Amaral et al., 2008).Because of the correlated inheritance of the two loci, the decay rate of the LD coefficient decreases with increasing generations and recombination events.Genetic background also influences it.Domestication selection reduces genetic diversity within a population, reinforcing the correlation or linkage between SNP loci (Farnir et al., 2000).Consequently, populations with higher degrees of domestication exhibit stronger selection intensities (Odani et al., 2006), resulting in slower LD decay rates.The higher selection intensity in breeding bulls likely reduced the effective population size, thereby affecting LD in these groups.Meanwhile, the LD decay patterns in Xinjiang Brown and Chinese Holstein cows exhibit potential similarities, favoring the construction of a combined reference population.Overall, the LD decayed fastest in Xinjiang Brown cows, indicating lower levels of selection than the other three groups.This suggests a high level of genetic diversity in Xinjiang Brown cows, which harbor rich genetic resources with potential for development and use.These findings provide a scientific basis for the conservation, exploitation, and use of genetic diversity in Xinjiang Brown cattle.
Genetic structure analysis can elucidate phylogenetic relationships and genetic distances among different populations (Whelan and Goldman, 2001).Under the influence of natural and artificial selection, populations exhibiting pronounced genetic differences are evident.When K = 2, there is a distinct genetic structure differentiation among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows.This is related to the breeding strategies employed during the intense selection process of Xinjiang Brown and Chinese Holstein cattle, where most female progenitors in the early stages of population breeding originated from local Chinese yellow cattle (Liu, 2013).
The breeding of Xinjiang Brown and Chinese Holstein cattle involves the introduction of foreign breeds for crossbreeding to improve and enhance the local yellow cattle population in China.Subsequently, through crossbreeding fixation and selective breeding, these breeds have been further developed and stabilized.The genetic background of Xinjiang Brown cattle is traceable to the original crossbreeding improvement in 1951, when the maternal breed was Kazakh cattle (Ma, 2015).Conversely, the genetic background of the  (Liu, 2013).The distinct genetic structure observed among Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows indicates a significant genetic distance between these two major groups.This significantly affects the genetic structure of multi-breed joint reference genomes established later, subsequently affecting the accuracy of genomic predictions.
Figure 5 shows that the coefficient of kinship among individuals within the populations of Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls was relatively high.However, the coefficient of kinship between these groups and the Chinese Note: "All data" refers to the calculation results without excluding the phenotypic data of the 50 validation animals; "All data-50" refers to the calculation results after excluding the phenotypic data of the 50 validation animals.σ 2 a = additive genetic variance; σ 2 E = residual variance; h 2 = heritability; SE, standard error.
Holstein cow population tended to be close to 0. This result suggests that when conducting multi-breed genomic selection, it is not advisable to use only one breed as the reference population and the other as the selection population.Instead, it is necessary to fully consider the relationship between the selection and reference populations.When the kinship between the two breeds involved in multi-breed genomic selection is weak or extremely weak, it is necessary to include a certain number of individuals from the same breed in the reference population to ensure high reliability in the estimation of genomic breeding values.
The PCA results identified two major clusters: one containing Xinjiang Brown cows, Xinjiang Brown bulls, Brown Swiss bulls, and Chinese Holstein cows, whereas the other mainly contained Chinese Holstein cows.This suggests a relatively distant genetic relationship between these two clusters.Clustering of Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls is highly concentrated, Note: "All data" refers to the calculation results without excluding the phenotypic data of the 50 validation animals; "All data-50" refers to the calculation results after excluding the phenotypic data of the 50 validation animals.σ 2 a = additive genetic variance; σ 2 E = residual variance; h 2 = heritability; SE, standard error.
indicating a close genetic distance among these groups.When considering only the clustering of Chinese Holstein cows, a small subset can be distinguished from the larger population, which exhibits a distant genetic relationship with Xinjiang Brown cows, Xinjiang Brown bulls, and Brown Swiss bulls and with most Chinese Holstein cows.In the subsequent multi-breed genetic evaluations, it is recommended to exclude chip and phenotypic data from this subset of Chinese Holstein cows and their respective farms.This approach can help reduce the interference of genetic structure differences while estimating SNP effects.These findings suggest a low genetic linkage between Xinjiang Brown and Chinese Holstein cattle.

Genetic parameter analysis of the joint reference population
The number of genotyped Chinese Holstein cattle in the joint reference population significantly affects the estimation of variance components.Therefore, it is necessary to consider the number of genotyped animals from different populations in the joint reference population during multi-breed genetic evaluations.The most important factors that affect the reliability of genomic breeding value estimation are the proportion of genetic variance explained by SNPs and trait heritability (Steyn et al., 2019;van Grevenhof et al., 2019).These genetic parameters are directly related to the size and structure of the training population as well as the range, quality, and quantity of phenotypic and genomic information available for individuals in the training population.To ensure accurate genomic breeding value estimation, it is important to minimize the relationship between genotyped individuals within the training population and maximize the relationship between the training and prediction populations.Because ssGBLUP generates genomic breeding values for cows, it is particularly useful for cows with only parental average information.The single-step genomic evaluation combines information from all countries, considering potential duplicate counting of the same information, thereby

Reliability of genetic evaluation in joint reference populations
In theory, a model that assumes the closest distribution of SNP effects to their true distribution can achieve the highest reliability in genomic prediction.The GBLUP model assumes that all SNP effects follow the same normal distribution and compresses the effects of all SNPs to the same degree because different models have different assumptions about the distribution of SNP effects (Villar-hernÁndez et al., 2021).Several methods have been proposed to improve the accuracy of genomic prediction in small populations of dairy cattle (Marjanovic et al., 2021), and one effective approach is to use joint reference populations by combining reference data from different populations (Steyn et al., 2019;van Grevenhof et al., 2019).This method has reported significant benefits in genomic prediction for North American Holstein, European Holstein, Chinese Holstein, and Brown Swiss populations (Vanderick et al., 2017;van den berg et al., 2016).However, the accuracy of genomic prediction is dependent on the relationship between candidate and reference animals, requiring the reference population to be sufficiently close to the target population (Xu et al., 2019).Therefore, for multi-breed joint genetic evaluation between Xinjiang Brown and Chinese Holstein cattle, considering only the added number of animals is insufficient.Further in-depth analysis of important influencing factors, including assumptions about SNP effects (van den berg et al., 2019) and the weights of the A-and G-matrices in the H-matrix (Karaman et al., 2018;Botelho et al., 2021), is required to improve the accuracy and unbiasedness of predictions (Botelho et al., 2021).
Including cows in the genotyping reference population is necessary because of the limited number of cows with reliable phenotypic information available for predicting offspring traits.To improve the genomic breeding values of the population, it is necessary to include a certain number of validated bulls with reliable phenotypic information in the reference population (Vanraden et al., 2020).Previous studies have reported that the inclusion of cows in the validated bull reference population can improve the accuracy of genomic prediction (Ding et al., 2013).Although the phenotypic information for cows is less accurate than that for bulls with offspring validation, additional information can still be significant (Cole et al., 2021), considering the large number of cows available as reference animals.

Conclusion
The genetic structure of mature Xinjiang Brown and Chinese Holstein cows is different, and the individual kinship between these two populations is relatively distant.This increases the impact of genetic structure and kinship on the reliability of genomic breeding value estimation.Through comparisons of parameters, including heritability, breeding value reliability, and unbiasedness, it was initially determined that including 600 and 900 Chinese Holstein cows in the joint reference population positively impacted the genomic prediction of Xinjiang Brown cattle to some extent.In multi-breed genome selection, it is necessary to pre-evaluate the genetic structure and genetic relationship of the population.It is feasible to combine the Chinese Holstein cattle population into the Xinjiang brown cattle population to form a joint reference group for crossbreed genetic assessment.It can provide theoretical guidance for applied genomic genetic assessment and multi-breed genomic genetic assessment of Xinjiang brown cattle, and also provide reference for genome selection of other dual-use cattle and small population breeds.

FIGURE 1
FIGURE 1Venn diagram of GeneSeek GGP Bovine 150 k after quality control in Xinjiang Brown cattle and Chinese Holstein cattle.

FIGURE 2
FIGURE 2 LD decay of Xinjiang Brown cattle and Chinese Holstein cow.BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

FIGURE 3
FIGURE 3Analysis chart of population structure in Xinjiang Brown cattle and Chinese Holstein cow.BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

3. 2
Multi-breed genomic evaluation using a joint reference population 3.2.1 Descriptive statistics of the dairy traits of Xinjiang brown and Chinese Holstein cattleTable 4 lists the statistics, including sample size, minimum value, maximum value, mean, standard deviation, and coefficient of variation of the observed dairy traits of Xinjiang Brown and Chinese Holstein cattle.305dMY, MFY, MPY, and SCS between Xinjiang Brown and Chinese Holstein cattle differed significantly.

FIGURE 4
FIGURE 4Phylogenetic tree of Xinjiang Brown cattle and Chinese Holstein cow.BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

FIGURE 5
FIGURE 5Genomic relationship matrix of Xinjiang Brown cattle and Chinese Holstein cow.BSB is Brown Swiss Bull, CHC is Chinese Holstein cow, XBB is Xinjiang Brown Bull, XBC is Xinjiang Brown Cow.

FIGURE 6
FIGURE 6Principal Component Analysis of Xinjiang Brown Cattle and Chinese Holstein cow.

FIGURE 7
FIGURE 7 Phenotypically complete Xinjiang Brown cattle were screened from various Xinjiang Brown cattle farms for chip detection.After screening, 403 cows and 82 bulls from four core farms in Xinjiang region were selected.Moreover, we included 174 Xinjiang Brown cows from Xinjiang Uygur Autonomous Region State-owned Urumqi breeding farm, 50 Xinjiang Brown cows from Xinjiang Tianshan Animal Husbandry Bioengineering Co., Ltd.breeding farm, 130 Xinjiang Brown cows from the Tacheng Agriculture and Animal Husbandry Technology Co., Ltd., 49 Xinjiang Brown cows from Yili New Brown breeding farm, 71 bulls and 11 Brown Swiss bulls from Xinjiang Tianshan Animal Husbandry Bioengineering Co., Ltd.Bull breeding station.Chip data for Chinese Holstein cows were obtained from 2,633 animals in

TABLE 1
The standards for data filtering.

TABLE 2
Data statistics.

TABLE 3
Gradient grouping of joint reference group.

TABLE 4
Description of milk traits in Xinjiang Brown Cattle and Chinese Holstein Cattle.305 dMY: 305 daily milk yield; MFY: milk fat yield; MPY: milk protein yield; SCS: somatic cell score.SD: Standard deviation.
a During this period, China initially introduced various dairy breeds, including the Holstein cattle, Jersey cattle, Ayrshire cattle, Brown Swiss cattle, and Shorthorn cattle

TABLE 5
Genetic parameter estimation of 305dMY in joint reference population based on ssGBLUP.

TABLE 6
Genetic parameter estimation of MFY in joint reference population based on ssGBLUP.

TABLE 7
Genetic parameter estimation of MPY in joint reference population based on ssGBLUP.

TABLE 8
Genetic parameter estimation of SCS in joint reference population based on ssGBLUP.The ssGBLUP method can simultaneously analyze phenotypic, genomic, and pedigree information from both genotype and nongenotype animals by integrating external information.This method ensuring more accurate estimation of genomic breeding values.Adding carefully selected cows to the training population can expand the population, improve its structure and relationship with the prediction population, and reduce selection bias.However, in this study, a large proportion of the chip data came from cows, with relatively fewer bulls for validation, probably being a reason for the poor prediction performance.

TABLE 9
Genetic parameter estimation of 305dMY in joint reference population based on ssGBLUP.TABLE 10 Genetic parameter estimation of MFY in joint reference population based on ssGBLUP.foreign paternal lines with no or few offsprings.This research result suggests that we can conduct cross-country genetic evaluations with the Brown Swiss bull origin introduced during the breeding of Xinjiang Brown cattle, which could improve the reliability of genomic predictions for Xinjiang Brown cattle.
TABLE 12 Genetic parameter estimation of SCS in joint reference population based on ssGBLUP.