Your new experience awaits. Try the new design now and help us make it even better

DATA REPORT article

Front. Anim. Sci.

Sec. Animal Breeding and Genetics

Analysis of Bovine Population Structure and Genetic Diversity of Wild and Related Species in the Ili River Valley

Provisionally accepted
Cheng  HouCheng Hou1,2Tongjun  GuoTongjun Guo1,2,3*Bo  HoBo Ho1,2
  • 1新疆维吾尔自治区畜牧科学院, 乌鲁木齐, China
  • 2新疆农业大学, 乌鲁木齐, China
  • 3Xinjiang Key Laboratory of Feed Biotechnology, Urumqi, China, 乌鲁木齐, China

The final, formatted version of the article will be published soon.

The genetic diversity of bovine species is a core pillar for the sustainable developm ent of animal husbandry, closely linked to geographical isolation, breeding history, and artificial selection. The Ili River Valley in Xinjiang, as a unique ecological region, h as nurtured diverse bovine genetic resources.These populations represent valuable genet ic resources for sustainable breeding, climate resilience, and future improvement progr ams, yet may face genetic erosion under intensified selection and ongoing environmen tal change. From a broader biodiversity and conservation genetics perspective, genome -wide assessments provide essential evidence for prioritizing conservation actions, parti cularly in regions experiencing habitat fragmentation and declining effective population size. These include indigenous primitive breeds adapted to harsh environments (such as Kazakh cattle and Altay white-headed cattle), dual-purpose Xinjiang Brown cattle d eveloped through multi-breed crossbreeding [1,2], and introduced European specialized b reeds (such as Angus and Simmental cattle) [3]. Concurrently, to establish a more com prehensive genetic reference system, this study incorporates the wild relative species, yak, inhabiting the high-altitude regions of Kizilsu Kirghiz Autonomous Prefecture into the analysis. This aims to decipher population genetic patterns within a complete "lo cal-exotic-wild relative" framework. However, existing research in this region suffers f rom significant fragmentation and technical limitations: most studies have focused sole ly on a single breed or a limited number of breeds (Zhang Tao 2024)analyzed only structural variation in Xinjiang Brown cattle) [4,5], lacking systematic joint analysis of all major cattle populations within the region; Technologically, early studies predom inantly relied on low-throughput markers like SSRs or mitochondrial DNA [6] [7]. In addition, reference genomes based on common cattle may exhibit coverage bias when analyzing indigenous breeds and yak, potentially underestimating genetic diversity [8].Furthermore, no studies have conducted whole-genome comparisons between domestic cattle populations in the Ili River Valley and their wild relatives, the yak, leaving th e genetic differentiation boundaries and relationships between "domestic-wild relative" populations unclear [9,10]. However, a comprehensive genome-wide comparison integra ting indigenous cattle, introduced breeds, and yak populations within the same analytic al framework in the Ili River Valley remains lacking.Therefore, this study aims to systematically elucidate the genetic structure and diversity characteristics of seven cattle populations (including six domesticated populations and one yak population) from the Ili River Valley and adjacent regions using whole-genome resequencing technology [11]. The core objectives are: to clarify the genetic differentiation relationships among "local-exotic-wild relatives" populations through principal component analysis, phylogenetic trees, and population genetic structure analysis [12,13]; to precisely quantify and compare genetic diversity levels across populations by calculating metrics such as information content of polymorphisms, heterozygosity, and linkage disequilibrium decay, thereby identifying dominant and vulnerable groups [14]; simultaneously, integrating population genetic findings with historical introduction records to explore hybridization mechanisms and genetic composition in Xinjiang Brown cattle [15,16]. The theoretical motivation of this study lies in filling the gap of systematic, multi-population, genome-wide genetic assessments in this region. Its practical significance is to provide reliable molecular evidence for formulating tiered conservation strategies (e.g., establishing core conservation populations for the highly diverse Kazakh cattle and designating protected areas for the low-diversity but uniquely adaptable yak) [17,18] and future hybrid breeding programs [19] .This paper sequentially presents sample collection and analytical methods, results on population genetic structure and diversity, and conducts in-depth discussions based on these findings. It concludes with targeted resource conservation and breeding strategies. This study employed whole-genome resequencing technology to analyze 65 samples across seven populations: Angus cattle, Simmental cattle, Holstein cattle, Xinjiang Bro wn cattle, Kazakh cattle, Altay White-Headed Cattle, and yak. All samples were colle cted by drawing 10 mL of jugular vein blood into EDTA anticoagulant tubes and stor ed at -20°C. DNA was extracted using the EasyPure Blood Genomic DNA Kit (Quan shijin, Beijing). Following successful sample qualification, DNA was randomly sheared using a Cova ris ultrasonic homogenizer. The library preparation workflow comprised end repair, A-t ailing, sequencing adapter ligation, fragment selection, PCR amplification, and purificat ion. The final DNA library was obtained. The library preparation workflow was as fol lows:(1) End repair of fragmented DNA Raw sequencing reads (Sequenced Reads) contain adapter-bound and low-quality reads.To ensure the quality of information analysis, raw reads must be filtered to obtain clean reads, upon which all subsequent analyses are based. The data processing steps are as follows:(1) Remove adapter-bound paired-end reads;(2) Remove paired-end reads where Ns exceed 10% of the read length;(3) Remove paired-end reads where low-quality bases (Q ≤ 5) exceed 50% of the read length. Clean reads were aligned to the Bos taurus reference genome (ARS-UCD1.2) using BWA v0.7.17-r1188 [20]. To facilitate cross-population comparisons, yak (WY) reads were also mapped to ARS-UCD1.2; however, because yak is a wild relative, mapping to a cattle reference may introduce reference bias and potentially underestimate diversity estimates in yak. Duplicate reads were removed using SAMtools v1.15 (rmdup) [21]. SNPs were called using bcftools v1.16 [22]and filtered using the following criteria: (i) read depth (DP) ≥ Footnote: Yak (WY) reads were mapped to ARS-UCD1.2 for cross-population comparison; however, mapping a wild relative to a cattle reference may introduce reference bias (e.g., reduced mapping efficiency and/or missed variants in divergent regions), which can potentially underestimate genetic diversity and affect LD-related metrics in yak. Therefore, diversity and LD results involving yak should be interpreted with this consideration. Genetic diversity analysis was performed on quality-controlled genomic data from seven populations. The population mean heterozygosity (He), population heterozygosity (Ho), and population-wide heterozygosity (PIC) were calculated using PLINK v1.90 software [23]. The population linkage disequilibrium (LD) decay distance (r²) was computed with PopLDdecay v3.40 software [24]. Linkage disequilibrium (LD) refers to the non-random association between alleles at different genetic loci within a population. Specifically, within a given population, two loci are considered in LD when the probability of an allele at one locus co-occurring with an allele at another locus exceeds the probability of their random co-occurrence. The strength of LD is typically represented by D`and r 2 values. Generally, in LD analysis, wild species exhibit lower LD values, while domesticated species, influenced by positive selection, tend to have higher LD values. The distribution of LD along chromosomes is commonly depicted using LD decay maps, which allow observation of the rate at which LD decays with genetic or physical distance. A phylogenetic tree (also known as an evolutionary tree) is a branching diagram or tree that describes the sequence of differentiation among populations, representing their evolutionary relationships.TreeBeST was selected for phylogenetic reconstruction because it efficiently handles large genome-wide SNP datasets and provides robust distance-based tree inference, making it suitable for population-level analyses. The degree of kinship between populations can be inferred based on similarities or differences in their physical or genetic characteristics.Phylogenetic trees are constructed using the Neighbor-Joining method. Following SNP detection, individual SNPs can be used to calculate distances between populations. The distance between two individuals i and j is computed via the following formula:Where L represents the length of the high-quality SNP region. If the alleles at position 1 are A/C, then:d ij(1)= 0, ℎ 0.5, ℎ 0.5, ℎ 1, ℎA phylogenetic tree was constructed using the TreeBeST v1.9.2 [25]software (http://treesoft.sourceforge.net/treebest.shtml) to calculate the distance matrix, which served as the basis for applying the neighbor-joining method with bootstrap values set to 1000. Principal Component Analysis (PCA) is a purely mathematical method that transforms multiple correlated variables through linear transformation to select a smaller number of important variables. PCA finds applications across many disciplines. In genetics, it is primarily used for cluster analysis. Based on the degree of SNP variation within individual genomes, it clusters individuals into distinct subgroups according to different phenotypic characteristics using principal components. It is also employed for cross-validation with other methods.PCA is applied only to autosomal data with n=65 individuals, ignoring loci with more than two alleles and mismatched data. The PCA analysis method is as follows:For SNP position k in individual i, the state is represented as [0,1,2]: 0 indicates homozygosity for the reference allele, 1 indicates heterozygosity, and 2 indicates homozygosity for a non-reference allele. M is an n×S matrix containing the standard genotypes:In the formula, E(dk) represents the mean value of dk. The individual sample covariance n× n matrix is calculated using X=MMT/S.Currently, GCTA v1.24.2 is the software used for PCA analysis. In this study, the GCTA [26]software was employed to perform PCA on filtered SNPs, yielding a score matrix for each sample across principal components (PCs). The values from the top three PCs were plotted. Population genetic structure refers to the non-random distribution of genetic variation within a species or population. A population can be divided into several subgroups based on geographic distribution or other criteria. Individuals within the same subgroup share a higher degree of kinship, while those between subgroups exhibit slightly more distant relationships.Population structure analysis aids in understanding evolutionary processes and can determine an individual's subpopulation affiliation through genotype-phenotype association studies. In this study, Admixture v1.3.0 [27] software was employed to construct population genetic structure and pedigree information. This sequencing run generated The SNP marker density across seven populations exhibits a consistent pattern at the chromosomal level, with larger chromosomes showing higher densities and smaller chromosomes lower densities, though significant differences exist between populations (Figure 1). The neighbor-joining tree corroborated the PCA and structure results. The results of linkage disequilibrium (LD) decay analysis (Figure 6) indicate that the rate of LD decay is highly negatively correlated with genetic diversity levels. The local breeds with the highest genetic diversity (KAZ, AWH, XH) exhibited the fastest LD decline; the introduced breeds (ANG, SIM, HOL), influenced by artificial selection, showed slower LD decline; and the yak (WY) exhibited the slowest LD decline, consistent with its lowest genetic diversity level. Population genetic diversity constitutes the fundamental basis of evolutionary potential and environmental adaptability, and its accurate assessment is essential for the effective conservation and utilization of animal genetic resources [28,29]. With the rapid development of whole-genome resequencing (WGS) technologies, genome-wide evaluations of population structure and diversity have become increasingly reliable and informative, particularly for complex livestock systems that include indigenous breeds, introduced breeds, and wild relatives [29]. In this study, WGS data were used to systematically characterize the genetic diversity, linkage disequilibrium (LD) patterns, and population structure of seven bovine populations from the Ili River Valley and adjacent regions, providing an integrated genetic framework for regional cattle resource management. Polymorphic information content (PIC) is a widely used indicator of locus-level polymorphism and overall population diversity [30]. Botstein et al. [30] proposed that PIC values between 0.25 and 0.50 represent moderate polymorphism; however, genome-wide SNP-based studies frequently report substantially lower PIC values due to the predominance of rare variants across the genome [31]. In the present study, all populations exhibited PIC values below 0.25, consistent with observations from whole-genome resequencing analyses in cattle and other livestock species [31,32].Notably, indigenous breeds such as Kazakh cattle (KAZ) and Altay White-headed cattle (AWH) displayed higher PIC values than introduced breeds and yak (WY), forming a clear diversity gradient of indigenous breeds > introduced breeds > wild relatives. Similar patterns have been reported in Qaidam cattle [32] and Xinjiang Brown cattle [40], where long-term natural selection under heterogeneous environments preserved adaptive genetic variation. In contrast, introduced breeds such as Angus and Holstein showed reduced diversity, likely reflecting intensive artificial selection, closed breeding systems, and directional selection for production traits [33,41]. The low PIC observed in yak populations may further reflect historical population bottlenecks and geographic isolation in high-altitude environments [39]. Observed heterozygosity (Ho) and expected heterozygosity (He) provide complementary insights into within-population genetic variation and equilibrium status. In this study, Ho and He values were highly consistent in KAZ and AWH populations, suggesting approximate Hardy -Weinberg equilibrium and stable breeding structures. Similar equilibrium patterns have been reported in other indigenous cattle populations maintained under traditional pastoral systems [32,35].Conversely, introduced breeds exhibited slightly higher Ho than He, which may reflect recent admixture, cross-regional introductions, or management-driven outbreeding [33].Angus cattle, in particular, are known to maintain relatively high heterozygosity during breed introduction and expansion [33]. Yak populations showed the lowest Ho and He values, consistent with previous studies reporting reduced heterozygosity in geographically isolated yak populations [34,39]. Wang et al. [35] further demonstrated that gene flow among Kazakh cattle populations is regionally structured, supporting the interpretation that geographic connectivity plays a key role in shaping heterozygosity patterns. LD decay provides valuable information on population history, effective population size, and selection intensity. In agreement with previous studies, populations with higher genetic diversity (KAZ, AWH, XH) exhibited faster LD decay, whereas populations under strong artificial selection (ANG, SIM, HOL) and bottlenecked populations (WY) showed slower LD decay [36][37][38]. Gao et al. [36] demonstrated that faster LD decay is generally associated with larger effective population sizes and reduced inbreeding, a pattern consistently observed in indigenous breeds across China.The LD decay patterns observed in this study are also consistent with findings from Hainan Yellow cattle [37] and Simmental cattle [38], further validating the robustness of the observed diversity gradient. Although yak populations exhibited the slowest LD decay, this pattern should be interpreted cautiously, as SNP ascertainment bias resulting from mapping yak reads to a Bos taurusreference genome may partially inflate LD estimates and underestimate true diversity [39]. This methodological limitation highlights the importance of species-specific reference genomes for future yak genomic studies. Population structure analyses based on PCA, phylogenetic trees, and admixture revealed a clear three-tier genetic differentiation pattern: wild relatives (yak), indigenous cattle, and introduced breeds. The complete separation of yak from domestic cattle is consistent with its distinct evolutionary history and interspecific divergence [39]. Introduced breeds formed compact clusters, reflecting genetic homogenization driven by standardized global breeding programs [40,41]. European breeds [4,40]. Both genomic evidence and phenotypic studies indicate that XH cattle represent a stabilized composite breed with mixed ancestry [42]. This transitional genetic structure highlights the dual value of XH cattle as both a production-oriented breed and a reservoir of locally adapted alleles. This study integrates indigenous cattle, introduced breeds, and the wild relative yak within a unified genome-wide framework, providing a regional reference for conservation genetics and breeding decisions. Nevertheless, the sample sizes of introduced breeds were relatively small, which may limit statistical power for some comparisons. In addition, the minimum sequencing depth was modest and yak reads were mapped to a Bos taurus reference genome, which may introduce reference and SNP ascertainment bias in diversity and LD decay estimates. Future studies with expanded sampling, deeper coverage, and yak-specific references will further refine these inferences. Collectively, these results demonstrate that cattle genetic resources in the Ili River Valley exhibit pronounced structural differentiation and diversity gradients shaped by natural selection, artificial selection, and geographic isolation. Indigenous breeds such as KAZ and AWH retain high genetic diversity and should be prioritized for in situ and ex situ conservation to safeguard adaptive genetic resources [28,36]. For Xinjiang Brown cattle, breeding strategies should balance the utilization of heterosis with the preservation of indigenous genetic components to prevent excessive dilution of local alleles [40,42].Although yak populations exhibit lower overall diversity, they harbor unique high-altitude adaptive genes and represent an irreplaceable wild genetic resource [39].Establishing protected breeding zones and promoting controlled gene flow among regional yak populations may enhance long-term population viability. Overall, the tiered conservation and breeding strategy proposed here provides a scientifically grounded framework for sustainable livestock development in this region. Through whole-genome resequencing, this study systematically analyzed the genetic structure and diversity of seven cattle populations in the Ili River Valley, yielding the following core conclusions:(1) Population structure exhibits a clear three-tiered differentiation pattern: wild relative (yak)→ introduced breeds → local breeds, driven by geographic isolation and artificial selection.(2) Genetic diversity follows a gradient: "indigenous primitive breeds > Xinjiang Brown cattle > introduced breeds > yak," with LD decay rates highly correlated with these diversity levels.(3) The genome of Xinjiang Brown cattle reflects a history of multi-breed hybridization, consistent with its documented breeding history.(4) The Kizilsu Prefecture, despite its low overall diversity, represents a valuable wild genetic resource due to its distinct genetic identity.We recommend implementing a tiered conservation strategy: immediate establishment of core conservation populations for genetically diverse Kazakh and Altay White-headed cattle, designation of an in situ conservation area for the Kizilsu Prefecture yak, and the development of scientific breeding programs for Xinjiang Brown cattle to balance hybrid vigor and the preservation of local genetic integrity. This will secure a robust genetic foundation for sustainable livestock development in the region.

Keywords: admixture, conservation genetics, Indigenous cattle breeds, linkagedisequilibrium decay, Whole-genome resequencing, Yak

Received: 11 Nov 2025; Accepted: 19 Jan 2026.

Copyright: © 2026 Hou, Guo and Ho. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Tongjun Guo

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.