Design, validation, and application of a 1K liquid chip for genome-wide association analysis in Alpine Merino sheep

Guo, Tingting; Yuan, Chao; Liu, Jianbin; Yang, Bohui

doi:10.3389/fvets.2025.1690580

ORIGINAL RESEARCH article

Front. Vet. Sci., 18 December 2025

Sec. Livestock Genomics

Volume 12 - 2025 | https://doi.org/10.3389/fvets.2025.1690580

This article is part of the Research TopicGenomic Selection and Evolution in Domestic AnimalsView all 10 articles

Design, validation, and application of a 1K liquid chip for genome-wide association analysis in Alpine Merino sheep

Tingting Guo^1,2,3^*

Chao Yuan^1,2,3

Jianbin Liu^1,2,3

Bohui Yang^1,2,3

¹Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences, Lanzhou, China
²Sheep Breeding Engineering Technology Research Center of Chinese Academy of Agricultural Sciences, Lanzhou, China
³Key Laboratory of Animal Genetics and Breeding on Tibetan Plateau, Ministry of Agriculture and Rural Affairs, Lanzhou, China

Because wool is an important animal fiber and mutton is rich in nutrients, fine wool sheep are one of the most economically important livestock. China has been selecting and breeding high-quality fine wool sheep breeds since the 1950s and is the largest processing and consuming country for fine wool sheep. In this study, blood samples were collected from Alpine Merino sheep (AMS), Chinese Merino sheep (CMS), Aohan fine wool sheep (AHS), and Qinghai fine wool sheep (QHS). Genomic DNA was extracted and subjected to genome resequencing. GenoBait technology was used for probe design and site optimization, followed by synthesis of a low-density liquid-phase chip. Finally, 409 AMS were randomly selected to verify the chip. The results showed that a total of 1,012 single-nucleotide polymorphism sites (SNPs) were screened and retained for the synthesis of a low-density liquid chip for genome-wide selection and population genetics structure analysis. This chip can provide a useful tool for genome analysis of fine wool sheep and lay a solid foundation for subsequent breeding work.

Introduction

Fine-wool sheep form the cornerstone of the natural fiber industry, with their fine wool serving as a vital raw material for high-end textiles. Given their economic value, wool yield and quality have become core breeding objectives for fine-wool sheep, prompting the development of specialized genetic evaluation frameworks for these traits. The introduction of genomic selection technology into fine-wool sheep breeding has significantly accelerated genetic progress in economic traits, encompassing disease resistance, growth performance, and wool characteristics. Genome-wide selection enhances the precision of traditional genetic evaluations by integrating genomic information, enabling more accurate parameter estimation. Since Illumina launched the first sheep chip (Ovine SNP 50 Genotyping BeadChip v1) in 2009, multiple commercial sheep chips have been successfully developed. With advances in high-throughput sequencing and data analysis technologies, liquid-phase breeding chips based on targeted capture sequencing (genotyping by target sequencing, GBTS) have experienced rapid development as a new technology in recent years (1). GBTS technology is a technique for deep resequencing of target sites. The technical system is completely based on independent research and development results, is not restricted by foreign patented technology, does not rely on foreign equipment, and has the advantages of flexible operation, accuracy, reliability, low cost, and a wide application range. GBTS offers multiple advantages, including flexible marker addition and deletion and highly efficient detection, and has been widely applied in genomic selection for livestock species such as chicken (2), pigs (3), cattle (4), and aquatic animals (5). Whole genome selection breeding (GS) is a more efficient and accurate breeding method that uses high-density markers covering the whole genome for selective breeding after traditional breeding and molecular marker-assisted breeding (MAS). It can shorten the generation interval through the implementation of early selection, improve genetic estimated breeding value (GEBV), accelerate the genetic progress, and truly realize genomic technology to guide breeding practices. Due to the high cost of resequencing and high-density solid-phase gene chip detection, it is difficult for researchers and breeding companies to perform large-scale genome-wide selection breeding. Consequently, the development of low-density liquid-phase chips based on GBTS has become an inevitable trend. This study developed and designed a 1K functional marker site set of fine wool sheep using single nucleotide polymorphisms (SNPs) related to important economic traits of fine wool sheep and applied it to genome-wide association analysis (GWAS), which not only verifies the selected functional marker sites but also provides a technical means for functional markers, candidate gene screening, and genomic selection of fine wool sheep.

Materials and methods

Sample collection

All experimental protocols and procedures were approved by the Institutional Animal Care and Use Committee of the Lanzhou Institute of Husbandry and Pharmaceutical Science of the Chinese Academy of Agricultural Sciences (approval no. NKMYD201805; approval date: October 18, 2018). Whole-genome resequencing data from the chip design section were collected from 460 individuals in four fine wool sheep populations, including 220 Alpine Merino sheep (AMS; 75 males and 145 females; Huangcheng, Gansu, China), 120 Chinese Merino sheep (CMS; 60 males and 60 females; Gongnaisi, Xinjiang, China), 60 Aohan fine wool sheep (AHS; 30 males and 30 females; Chifeng, Inner Mongolia, China), and 60 Qinghai fine wool sheep (QHS; 30 males and 30 females; Sanjiaocheng, Gangcha, China). All the sheep were randomly selected without pedigree information. Blood samples for microarray validation were obtained from 409 additional AMS females from Huangcheng, Gansu, China. Blood and wool samples were collected from all individuals.

Phenotypic data acquisition

Birth, weaning (3.5 months), yearling (12 months), and adult (30 months) weights of all sheep were measured using an electronic scale, and sex was recorded. Birth weights were recorded within 0.5 h of birth, and weaning, yearling, and adult weights were measured after 12 h of fasting. Wool samples were subjected to objective testing at the Quality Supervision, Inspection, and Testing Center for Animal Fur and Products of the Ministry of Agriculture (Lanzhou, China). The testing of wool traits was carried out according to relevant China national standards: Staple length (SL) (GB/T 6976-2007), clean fleece weight rate (CFWR) (GB/T 6978-2007), staple strength (SS) (GB/T 13835.5-2009), fleece extension rate (FER) (GB/T 13835.5-2009), fiber diameter (FD) (GB/T 10685-2007), and coefficient of variation of fiber diameter (FD-CV) (GB/T 10685-2007). Phenotypic data on body weight, shearing volume, and horn type were collected from the fine wool sheep production performance measurement field and shearing site. All data obtained from the tests and measurements were analyzed using SPSS software (version 20.0; SPSS Inc., Chicago, IL, USA) for basic statistics, including maximum value (Max), minimum value (Min), mean value (Mean), and coefficient of variation (CV). All phenotypic data were tested for normal distribution and outliers were excluded, and then analyzed and plotted by Pearson correlation using the cor function in R (version 4.0.3) (6, 7).

Selection of candidate SNPs from whole genome resequencing

Blood samples were subjected to 5 × resequencing using a HiSeq X Ten (Illumina, San Diego, CA, USA). The detailed sequencing information can be found in our previous study (8, 9). High-quality sequencing data were aligned to the reference sheep genome assembly Oar_v4.0 (GCF_000298735.2) using the Burrows-Wheeler Aligner (BWA) software (v0.7.11) (10) (parameter: mem-t 4-K 32-M). Duplicates were removed using SAMtools (10) (parameter: rmdup). Finally, the sample alignment rate were analyzed statistically. We used SAMtools (11) to detect SNPs in the population samples and obtain high-quality SNPs through the following filtering and screening approach: (i) the support number (coverage depth) of SNP was >3; (ii) the proportion of MIS (missing) was < 10%; (iii) minimum allele frequency (MAF) was >5%. ANNOVAR is an efficient software tool for functionally annotating genetic variants detected across multiple genomes (12). Therefore, we used the ANNOVAR package (Version:2013-05-20) to annotate the SNPs.

Whole-genome resequencing (WGS) can detect a large number of SNPs information through sequence alignment. Correlations between the SNPs and traits were tested using mixed linear models in GEMMA software (13, 14). The statistical analysis model used in this study wasy = Xβ+Zγ+Wδ+e, where y is the phenotypic trait, Xβ is the matrix of the fixed effects, which include gender, the first three principal components, variety, and field. Zγ is a matrix of random effects, γ represents random effects ( $V a r (γ) = {σ_{g}}^{2} K$ , K is the kinship matrix). Wδ is a matrix of random effects, and e is the random error with the distribution e~N (0, Iσe²), among them I is the identity matrix, σe² is the unknown residual. Population genetic structure and sex were used as fixed effects, and individual kinship was used as a random effect to correct for the influence of population structure and individual kinship. The significance threshold for GWAS was defined using the Bonferroni correction method. The total type I error rate was controlled at 5%, and the significance threshold of the genome was 0.05/Nsnp, where Nsnp is the number of SNPs remaining after quality control (15).

SNP site screening principle and probe design

To meet the requirements of chip development, the position information of the target site and the 200bp sequence upstream and downstream of the site were compared with the reference genome Oar_v4.0 (GCF_000298735.2) using blastn (parameter: value 1e⁻⁵) (16). The accuracy of the site location information was checked by selecting the comparison result with the highest score according to the score value of the comparison. After the accuracy of the target site position information was verified, the probes were designed using Geno Baits Probe Designer software, with probe length set at 110 nt and GC content >30%. Every SNP site covered by at least three liquid-phase capture probes. Probes selected with GC content between 30% and 80%; Calculate the number of homology regions, and then pick the probes with homology regions < 10; The selected region should not contain probes in SSR or N regions; Filter out the final probe sequence to confirm the coverage of the site (probe design parameters: -len 110–110-gc 30–80-hom 5–d 2 -size 120–dis 20). Biotin-labeled polynucleotide probes were synthesized using in situ chip synthesis technology. Subsequently, these probes hybridize with target genomic regions to form double strands. Streptavidin-coated magnetic beads then adsorb biotinylated molecules. Following elution, amplification, and sequencing, SNP genotypes are identified.

Performance evaluation of the 1K liquid chip

Four hundred and three samples were used for DNA library construction and hybridization capture sequencing to test the 1K liquid chip. Raw data obtained by sequencing were filtered using fastp (version 0.20.0, parameter: -n 10 -q 20 -u 40) to remove the adapter sequence and paired reads containing N content more than 10% and low-mass (Q ≤ 20) bases that exceed 40%. The BWA software (MEM comparison method) (v0.7.11) (10) was used to compare with the reference genome sequence (Oar_v4.0, GCF_000298735.2). The BAM files were sorted using Samtools software (v1.18) (11) and then removed duplicate sequences using GATK software (v4.1.4.1) (17).

The quality of the targeted sequencing data was assessed using Picard's CollectHsMetrics (18) was used to calculate metrics including Fold-80 penalty and the fraction of target bases achieving at least 10 × coverage.

GCTA (v1.92.2) was used to conduct principal component analysis (PCA) of the SNPs in all samples (19). Haploview was used to calculate the LD size (r2) between two pairs of markers and to plot the change with distance (20). NJ tree was constructed using MEGA-X software (model: p-distance; bootstrap: 1,000 times) (21). Population structure analysis was performed using Admixture (v1.3) (22), assuming that the number of groups (K value) of the sample was 1–15, and then clustered according to the Cross-Validation Error (CV error); the smallest K value corresponded to the optimal number of groups. TASSEL was used to calculate common GWAS models (GLM, GLM(Q), MLM(K), and MLM(QK) models) (23). The population structure matrix corresponding to the optimal K value from Admixture was used as the Q matrix of the corresponding model, and the inter-sample affinity matrix calculated using gcta software was used as the K matrix of the corresponding model. The p-value obtained by the calculation was -log10 and displayed in the Manhattan and Q-Q plots.

Results

Design of the 1K liquid chip

A total of 988 functional markers related to wool traits, 672 functional markers related to weight traits, 301 integrated functional markers related to important economic traits of fine wool sheep reported in the literature and OMIA (Online Mendelian Inheritance in Animals), and 1,961 functional labels were used in this study. After site checking, conversion of reference genome and data cleaning (format adjustment, removal of duplicate sites, removal of sites located on chromosome 27), 1,875 sites remained. For 620 samples as research group, A total of 722 sites with minor allele frequency (MAF) < 0.05, missing rate >20%, and non-binary sites with heterozygosity ratio >80% were removed. Finally, 1,012 SNPs were retained after probe design and screening, and the probe design coverage was 63.36%. The 1,012 SNPs distribution plots of the probe design are shown (site distribution parameter w 500000-n 1) (Figure 1A). After the synthesis of the functional marker probe was completed, six out of 409 samples were randomly selected for testing, and the final sample site capture efficiency was between 99.07% and 99.58%. Probe synthesis to identify functional markers has been successful, and the site capture efficiency can be used for large-population genotyping.

Figure 1

Three pairs of plots are shown. Each pair consists of a Manhattan plot on the left and a QQ plot on the right. Plot A shows CFWA data with chromosomes on the x-axis and -log10(p) on the y-axis in both plots. Plot B represents FD data, similarly structured. Plot C displays FD_CV data, with noticeable peaks in the Manhattan plot and deviations in the QQ plot. The QQ plots show observed versus expected -log10(p) values. Green and yellow points indicate data points in all plots.

Figure 1. Sites of 1K liquid chip. (A) Location map of chromosome distribution. (B) Sample statistic. NA_rate ≤ 5% is the sample detection miss rate meeting the requirements; the lower the NA_rate, the fewer the number of undetected sites in the sample.

Genotyping performance of the 1K liquid chip

All 403 samples were sequenced using the 1K liquid chip yielded satisfactory results. After filtering, a total of 573,482,034 clean reads were obtained from 578,704,792 raw reads. A total of 86,805,718,800 bp of raw bases was obtained, and 82,796,415,340 bp of clean bases were obtained after filtration, with an effective rate of 91.58–96.74%, Q20 of 97.05–98.64%, and Q30 of 90.1–95.37%. Homogeneity and on-target rate jointly determine the efficiency of targeted sequencing. The uniformity of coverage and capture efficiency were quantified, with the Fold-80 penalty at 2.3 and the fraction of targets covered≥10 × at 99.24%. A total of 505,741,155 alignments were matched to the reference sequence across all clean reads, with a maximum sample alignment rate of 90.33% and a minimum rate of 86.85%. There were 25 samples with missing sites >5, accounting for 6% of the total sample size. The deletion rate at this site was between 0.08% and 1.27%. The number of heterozygous sites ranged from 189 to 359, and the heterozygosity rate ranged from 16.07% to 30.53%. Furthermore, the number of homozygous sites was 104–193, whereas the heterozygous rate ranged from 8.84% to 16.41%. The number of functional sites consistent with the reference sequence in the sample ranged from 692 to 816, and the ratio of the number of functional sites consistent with the reference sequence to the total number of non-missing sites ranged from 58.84 to 69.62 (Figure 1B).

Application of the 1K liquid chip

Marking quality control

Rare alleles, high deletion rates, and high heterozygosity alleles can cause abnormalities in population analysis and GWAS; therefore, this study filtered the marker sites in the original data and removed sites with minor allele frequencies (MAF) of < 0.05, sites with deletion rates >20%, sites with heterozygous ratios >80%, and non-binary allele sites. Finally, 1012 SNPs were labeled and filtered, and 722 SNPs were retained for subsequent analysis.

Principal component analysis and phylogenetic tree construction

Based on the filtered 722 SNP markers, this project conducted principal component analysis (PCA) to obtain the variance interpretation rate of each principal component (PC) and the score matrix of the sample in each PC. Key information extracted from the SNP information was divided into PC1, PC2, PC3, etc., according to the effects from large to small, and the scatter plot of the first three PCs pairwise is shown in Figure 2. The variance interpretation rates for PC1, PC2, and PC3 were 3.76%, 2.99%, and 2.83%, respectively. This study fully utilized the 1K liquid chip sequencing data from 403 AMS to construct a phylogenetic tree (Figures 3A, B) and population structure (Figures 3C–E), indicating relatively close genetic distances.

Figure 2

Panel A shows a diagram of chromosomes labeled one to twenty-six plus X, with various colored marks indicating data points at different megabases. Panel B presents a line graph displaying rates of NA, Het_alt, Hom_alt, and Ref across samples 90313 to Y28421718, with percentages on the y-axis ranging from zero to seventy.

Figure 2. (A–D) PCA sample clustering diagram.

Figure 3

Three panels display Manhattan and QQ plots. Panel A shows association data for CFWA with moderate peaks on chromosomes 6 and 16. Panel B depicts FD data with notable peaks on chromosomes 5 and 17. Panel C, for FD_CV, reveals subtle peaks on chromosome 5. The QQ plots to the right of each panel indicate observed versus expected p-values, showing deviation from the expected line, suggesting potential associations.

Figure 3. Population structure analysis. (A) CV error. (B) Phylogenetic tree. (C) Kinship heat map. (D) Frequency distribution of kinship relationships. (E) Histogram of the genetic composition of samples with different numbers of subgroups.

Genome-wide association analysis

Before conducting GWAS, the phenotypic data were tested for extreme values, and the 3δ rule was used to eliminate phenotypic extremes, i.e., extremes outside the range of plus or minus three standard deviations of the mean value were eliminated. After calculating the phenotypic data used in this analysis according to the above rule, no abnormal samples were found for each trait. The descriptive statistics of the phenotypic values are shown in Table 1 and Figure 4.

Table 1

Table 1. Descriptive statistics of phenotypic values.

Figure 4

Three sets of scatter plots and QQ plots depict genetic data. Panel A: Manhattan plot for CFWA showing chromosomal variations alongside a QQ plot for CFWR, with most data aligning along the expected line. Panel B: Manhattan plot for FD follows, with its corresponding QQ plot for FD displaying a similar pattern. Panel C: Manhattan plot for FD_CV exhibits greater variation, and the QQ plot for FD_CV shows data points deviating more noticeably from the line. Each panel analyzes markers across chromosomes 1 to 25 and XMT.

Figure 4. Boxplot of phenotypic values.

The four models commonly used in GWAS models (GLM, GLM(Q), MLM(K), and MLM(Q) models) were calculated using TASSEL software. The population structure matrix corresponding to the optimal K value of Admixture was used as the Q matrix of the corresponding model, and the inter-sample affinity matrix calculated by the gcta software was used as the K matrix of the corresponding model. The p-value obtained by the calculation was -log10 and Manhattan and Q-Q plots are plotted (Figures 5A–7C). The four models obtained significant SNP numbers related to cleaning rate, wool fiber diameter, and wool fiber diameter coefficient of variation of 27, 47, and 71, respectively, and the number of candidate genes was 30, 49, and 60, respectively, after annotation (Table 2, Figures 5–8).

Figure 5

Composite image with multiple visualizations. A) Line graph showing CV error decreasing as K increases from 1 to 15. B) Circular dendrogram categorizing into five color-coded groups. C) Heatmap with values -0.2 to 0.8, showing group clustering. D) Histogram of kinship with a range from -0.4 to 1.0, peaking around -0.2. E) Bar graphs displaying genetic structure from K=1 to K=15 with various color distributions.

Figure 5. (A–C) GLM model Manhattan and QQ charts.

Figure 6

Three vertical box plots comparing CFWR, FD, and FD_CV. The CFWR plot ranges from 50 to 70, centered around 60. The FD plot ranges from 15 to 22.5, centered around 17.5. The FD_CV plot ranges from 20 to 60, centered around 30. Each plot has outliers represented by dots.

Figure 6. (A–C) GLM (Q) model Manhattan and QQ charts.

Figure 7

Three panels show genomic data. Panel A displays a Manhattan plot with dots on chromosomes, representing CFWA, and a corresponding QQ plot on the right indicating observed vs. expected p-values. Panel B shows FD data similarly. Panel C shows FD_CV data in the same format. Each plot uses green and yellow dots, with the y-axis representing negative log p-values. The QQ plots include a red line indicating correlation.

Figure 7. (A–C) MLM (K) model Manhattan and QQ charts.

Table 2

Table 2. Significant SNPs and genes in different models.

Figure 8

Scatter plots labeled A, B, and C show data distribution for groups one to five based on principal components PC1, PC2, and PC3. Plot D is a 3D scatter plot illustrating the same groups across PC1, PC2, and PC3 dimensions. Each group is color-coded differently to differentiate within the plots.

Figure 8. (A–C) MLM (QK) model Manhattan and QQ plots.

A total of 67 candidate genes were annotated based on the 100 kb region upstream and downstream of the significantly correlated SNP site, including 12 candidate genes related to fiber diameter, namely BICC1, CAPN2, ELOVL2, ELOVL5, ERAP1, ERAP2, JAZF1, MFSD1, PLCB1, RNF144B, RNF43, and UBE2E1. There were 13 candidate genes associated with the coefficient of variation of fiber diameter, which included CDH1, CAPN2, CDH1, CRABP1, CRYBA4, DDX27, FADS2, IREB2, LCORL, LDB2, METTL16, SDR42E1, and WDR49. Twelve candidate genes were associated with net gross rate, which included LZTS1, MYO1E, RABEP1, MGST2, SRRM4, SUDS3, SLC1A1, EGFR, FADS2, RBBP8, PRRX1, and GRSF1.

The results of GO enrichment analysis on these candidate genes showed that the main enriched biological processes included carboxylic acid metabolism, cell response to nitrogen compounds, cell response to organic nitrogen compounds, cell response to oxygen-containing compounds, embryo development, lipid catabolic process, organic acid metabolism process, oxygen acid metabolism process, response to nitrogen compounds, response to organic nitrogen compounds, and oxygen acid metabolism process. The cellular components have membrane microdomains and perinuclear regions of the cytoplasm, and the molecular function is lipid binding (Figure 9).

Figure 9

Three bar charts labeled A, B, and C compare gene numbers across categories: Biological Process (green), Cellular Component (blue), and Molecular Function (orange). Chart A (CFWR) shows a high count in Molecular Function, notably Binding. Chart B (FD) emphasizes Cellular Component with peaks in Cell Part and Biological Process categories. Chart C (FD-CV) highlights fewer categories overall, focusing on Molecular Function like Binding and Biological Process processes such as Metabolic. The x-axis lists specific functions, and the y-axis represents the number of genes.

Figure 9. (A–C) GO analysis of candidate genes.

The results of KEGG pathway enrichment showed that they were mainly enriched in biosynthesis of unsaturated fatty acids, α-linolenic acid metabolism, cell adhesion molecules, Rap1 signaling pathway, PPAR signaling pathway, HIPPO signaling pathway, Apelin signaling pathway, apoptosis, cellular senescence, GnRH signaling pathway, and gap junction. The pathways enriched in all three traits were biosynthesis of unsaturated fatty acids, adhesion, and the Rap1 signaling pathway.

Discussion

In recent years, molecular marker technology has undergone a series of revolutionary changes in terms of the quantity, variety, throughput, and cost of analysis (24). International multinational breeding companies have long regarded molecular breeding and its close integration with conventional breeding techniques as important modern breeding methods. Large breeding companies have a complete industrial chain and market share that can support high R&D investment and establish platforms, technologies, and methods that efficiently use molecular marker detection systems to support their breeding processes. However, small- and medium-sized breeding companies cannot establish efficient and low-cost molecular marker-assisted breeding systems, which has become an important bottleneck restricting molecular breeding (25).

One of the difficulties with genomic selection is the cost of genotyping, although with the development of microarray technology and high-throughput sequencing, the cost has been greatly reduced. However, in dairy cows, because of their high individual value and long era interval, the cost of typing is acceptable relative to the individual value of dairy cows; however, for species with small individual value and short era interval, it still means high cost and lacks sufficient economic benefits. The use of medium- to high-density genotyping chips in sheep has enabled genomic selection in sheep breeding since 2009, but the cost of genotyping remains high for many routine uses. Therefore, it is necessary to develop low-density genotyping chips to reduce costs. One way to solve this problem is to perform genotyping using low-density chips and then infer and “fill” in the missing SNPs from high-density chips or resequencing data (26). Filling can only produce effective results if it is based on a low recombination rate within one generation and repeated every three to four generations. A better filling effect can only be achieved when the number of SNPs is increased. However, when a higher number of SNPs on a low-density SNP chip leads to an increase in the number of genotypes used to identify the corresponding reference haplotype, the probability of randomly identifying haplotypes common to the reference and candidate populations decreases (27, 28). The 1K Liquid Chip of fine wool sheep developed in this study can provide a tool and technical instruments for low-cost, large-scale implementation of fine wool sheep genome selection.

GWAS technology has identified multiple genetic variants associated with weight, disease resistance, meat quality, and hairy traits in sheep, but only a few SNP loci sets significantly related to important economic traits of fine wool sheep have been reported. The SNP locus set significantly related to the important economic traits of fine wool sheep obtained in this study was screened from resequencing data by GWAS, and then collected from other important functional sites reported in the literature to form a 1K liquid chip related to important economic traits of fine wool sheep. To more accurately determine the SNPs related to fiber diameter, fiber diameter coefficient of variation, and cleaning rate of fine wool sheep, this study also used the 1K liquid chip of fine wool sheep to perform GWAS on AMS, and some clearer and more accurate important SNPs and functional genes were mined, indicating that the collection can be used for genome association analysis and candidate functional gene mining.

GWAS accuracy depends on the phenotypic accuracy employed in the analysis and the degree of computational model fit. In this study, two models, GLM and MLM, the common models of GWAS, were used for calculation, the population structure matrix corresponding to the optimal K value of Admixture was used as the Q matrix of the corresponding model, and the inter-sample affinity matrix calculated by gcta software was used as the K matrix of the corresponding model. GLM is a fixed-effect model, and MLM adds random effects, that is kinship matrices, in addition to fixed effects. The focus of GWAS is to mine significant loci; while the focus is on fixed effects, random effects are added to control false positives at sites that are significantly correlated with phenotype. In order to increase the accuracy and credibility of candidate gene mining, two models were used and different effects were added to perform four calculations. Finally, the results of the four models were intersected to obtain candidate genes related to the fiber diameter, fiber diameter coefficient of variation, and cleaning rate of fine wool sheep.

In this study, candidate genes FADS2, FADS3, TDRD15, CDH1, CRABP1, CRABP4, and DDX27 were annotated in wool fiber diameter and cleaning rate, and it was found that these candidate genes were closely related to biological mechanisms such as skin development, defense, regeneration, and proliferation. For example, FADS2 is very important for human skin, and dysfunction of sebaceous FADS2 activity may play a role in skin abnormalities associated with skin lipids (29). Studies have shown that branched-chain fatty acids (BCFAs) and straight/normal odd-chain fatty acids (n-OCFAs) are the main fatty acids in human skin lipids, especially in sebaceous gland (SG) wax esters. Skin lipids contain varying amounts of monounsaturated BCFAs and n-OCFAs, accounting for >20% of the total fatty acids in some reports. Fatty acid desaturase 2 (FADS2) encodes a versatile enzyme that catalyzes the desaturation of Δ4-, Δ6-, and Δ8-to 10 polyunsaturated fatty acids, but only one saturated fatty acid palmitate, converting it to 16:1n-10; FADS2 is inactive near 14:0 or 18:0 (30). TDRD15 was found to be significantly upregulated in the skin of patients with atopic dermatitis (AD) in comparison with the proteomic outcomes of normal hair follicles in patients with atopic dermatitis (31). AD, a common inflammatory skin disease characterized by skin and systemic inflammation as well as barrier dysfunction, suggests that the upregulation of the conserved tudor domain protein 15 (TDRD15) involves complex biological processes such as skin hydration processes, protease-proteasome complex interactions, lipid metabolism, antioxidants, and imbalances in inflammatory pathways. E-cadherin (CDH1) is a well-known cell-to-cell adhesion molecule that mechanically stretches to promote skin cell regeneration and proliferation. Mechanical stress is transmitted to the epidermis and induces downstream signaling pathways to induce epidermal cell differentiation, and CDH1-induced keratinocytes dedifferentiation is a key event in mechanical stretching-mediated skin regeneration. CDH1 can be used as a potential therapeutic target to promote skin regeneration (32). CRABP1 is dynamically expressed as a cell-binding protein for retinoic acid (RA) signaling during skin development and in adult tissues (33). Although RA signaling is essential for epidermal differentiation, the mechanism of action is unknown and the distribution of RA between different nuclear receptors is regulated by RA-binding proteins. CRABP1 is expressed in the stroma of embryonic dermis and skin tumors but limited to the follicular dermal papilla of normal postnatal skin, which is upregulated in response to RA treatment or Notch activation and is negatively regulated by Wnt/β-catenin signaling. In this study, candidate genes CRABP1 and CRABP4 were also identified, indicating that they are involved in the biological function of fine wool sheep skin tissue development and the key pathway of wool growth. Other skin-related candidate genes, such as DDX27, were also found in this study, and it was reported that DDX27 is related to pathways associated with the hyperpigmentation of chicken skin through genetic annotation using selective signal analysis (34). ELOVL5 is prominently expressed in the basal layer of sebaceous gland cells in the head skin of male goats, and the candidate genes for pheromone synthesis are mainly expressed in the sebaceous glands of the pheromone-producing skin region. The “male effect” is a well-known phenomenon in female sheep and goats that occurs due to pheromone-induced activation of reproductive function, and studies have found significant increases in gene expression in the extended long-chain fatty acid family member 5 (ELOVL5) while inducing pheromone synthesis; therefore, ELOVL5 is considered to be the main candidate gene for pheromone synthesis (35).

Conclusions

In this study, 1,012 SNPs associated with economically important traits in fine wool sheep were screened from resequencing data using GWAS and pooled into a 1K liquid chip for genome-wide association analysis. Four models yielded 145 significant SNPs, which were annotated to 139 candidate genes associated with the washing ratio, wool fiber diameter, and coefficient of variation of wool fiber diameter. GO and KEGG enrichment analyses showed that the biosynthesis of unsaturated fatty acids, adhesion, and the Rap1 signaling pathway were enriched in all three traits.

Data availability statement

The original contributions presented in the study are publicly available. This data can be found here: BioProject: PRJNA1162637, https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA1162637.

Ethics statement

The animal study was approved by Animal Ethics Committee at the Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences (NO. 20231447). The study was conducted in accordance with the local legislation and institutional requirements.

Author contributions

TG: Conceptualization, Data curation, Funding acquisition, Methodology, Software, Writing – original draft. CY: Conceptualization, Validation, Writing – original draft. JL: Writing – review & editing. BY: Funding acquisition, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This research was funded by the Biological Breeding-Major Projects, grant number 2023ZD0405101; China Agriculture Research System, grant number CARS-39-02; and Innovation Project of the Chinese Academy of Agricultural Sciences, grant number 25-LZIHPS-07.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fvets.2025.1690580/full#supplementary-material

Abbreviations

AMS, Alpine Merino sheep; AHS, Aohan Fine Wool Sheep; BWA, Burrows-Wheeler Aligner; CMS, Chinese Merino Sheep; CFWR, Clean Fleece Weight Rate; FD, Fiber Diamer; FD-CV, Coefficient of Variation of Fiber Diameter; FER, Fleece Extension Rate; GEBV, Genetic Estimated Breeding Value; GBTS, Genotyping by Target Sequencing; GWAS, Genome-Wide Association Analysis; GS, Genome Selection; MAS, Molecular Marker-Assisted Breeding; MAF, Minimum Allele Frequency; QHS, Qinghai Fine Wool Sheep; SS, Staple Strength; WGS, Whole genome resequencing.

References

1. Xu YB, Yang QN, Zheng HJ, Xu YF, Sang YF, Guo ZF, et al. Genotyping by target sequencing (GBTS) and its applications. Scientia Agricultura Sinica. (2020) 53:2983–3004.

Google Scholar

2. Wang MY, Zhou CH, Xue Q, Yin JM, Jiang YX, Zhang HY, et al. Application effectiveness of “Youxin-1” in genetic diversity and structure analysis of local chickens. Hereditas. (2024) 46:640–8.

PubMed Abstract | Google Scholar

3. Liu YL, Qiu A, Zhang ZP, Wang X, Du HH, Luo WX, et al. The efficiency of haplotype-based genomic selection using genotyping by target sequencing in pigs. Scientia Agricultura Sinica. (2023) 57:2243–53.

Google Scholar

4. Lopes FB, Wu XL, Li H, Xu J, Perkins T, Genho J, et al. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes. J Animal Breed Genet. (2018) 135:14–27. doi: 10.1111/jbg.12312

PubMed Abstract | Crossref Full Text | Google Scholar

5. Liu MY, Li XP, Kong J, Meng XH, Chen MJ, Luo K, et al. Application of the liquid chip “Yellow Sea Chip No.1” in genetic evaluation of the base population with resistance to acute hepatopancreaticnecrosis disease in Litopenaeus vannamei. J Fisher China. (2023) 47:217–26. doi: 10.11964/jfc.20220513508

Crossref Full Text | Google Scholar

6. R Core Team (2014). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

Google Scholar

7. Fox J, Weisberg S. An R Companion to Applied Regression. New York City: Sage Publications. (2018).

Google Scholar

8. Lu Z, Yue Y, Yuan C, Liu J, Chen Z, Niu C, et al. Genome-wide association study of body weight traits in Chinese fine-wool sheep. Animals. (2020) 10:170–170. doi: 10.3390/ani10010170

PubMed Abstract | Crossref Full Text | Google Scholar

9. Zhao H, Guo T, Lu Z, Liu J, Zhu S, Qiao G, et al. Genome-wide association studies detects candidate genes for wool traits by re-sequencing in Chinese fine-wool sheep. BMC Genomics. (2021) 22:127. doi: 10.1186/s12864-021-07399-3

PubMed Abstract | Crossref Full Text | Google Scholar

10. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. (2009) 25:1754–60. doi: 10.1093/bioinformatics/btp324

PubMed Abstract | Crossref Full Text | Google Scholar

11. Heng L, Bob H, Alec W, Tim F, Jue R, Nils H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. (2009) 25:2078–9. doi: 10.1093/bioinformatics/btp352

PubMed Abstract | Crossref Full Text | Google Scholar

12. Kai W, Mingyao L, Hakon H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. (2010) 38:e164. doi: 10.1093/nar/gkq603

PubMed Abstract | Crossref Full Text | Google Scholar

13. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. (2006) 38:203–8. doi: 10.1038/ng1702

PubMed Abstract | Crossref Full Text | Google Scholar

14. Xiang Z, Matthew S. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. (2012) 44:821–4. doi: 10.1038/ng.2310

PubMed Abstract | Crossref Full Text | Google Scholar

15. Kristin N, Wenlei L, Gary C, Ya-Yu T, Daniele FM. Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms. BMC Genet. (2005) 6:S78. doi: 10.1186/1471-2156-6-S1-S78

PubMed Abstract | Crossref Full Text | Google Scholar

16. Christiam C, George C, Vahram A, Ning M, Jason P, Kevin B, et al. BLAST+: architecture and applications. BMC Bioinformatics. (2009) 10:421. doi: 10.1186/1471-2105-10-421

PubMed Abstract | Crossref Full Text | Google Scholar

17. Mckenna A, Hanna M, Banks E, Andrey S, Kristian C, Andrew K, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. (2010) 20:1297–303. doi: 10.1101/gr.107524.110

PubMed Abstract | Crossref Full Text | Google Scholar

18. Broad Institute. Picard Toolkit. GitHub Repository. (2019). Available online at: http://broadinstitute.github.io/picard/ (Accessed August 26, 2022).

Google Scholar

19. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Human Genet. (2011) 88:76–82. doi: 10.1016/j.ajhg.2010.11.011

PubMed Abstract | Crossref Full Text | Google Scholar

20. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. (2005) 21:263–5. doi: 10.1093/bioinformatics/bth457

PubMed Abstract | Crossref Full Text | Google Scholar

21. Sudhir K, Glen S, Michael L, Christina K, Koichiro T. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. (2018) 35:1547–9. doi: 10.1093/molbev/msy096

PubMed Abstract | Crossref Full Text | Google Scholar

22. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. (2009) 19:1655–64. doi: 10.1101/gr.094052.109

PubMed Abstract | Crossref Full Text | Google Scholar

23. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. (2007) 23:2633–5. doi: 10.1093/bioinformatics/btm308

PubMed Abstract | Crossref Full Text | Google Scholar

24. Qing Y, Jianan Z, Xiaolei S, Lei C, Jun Q, Mengchen Z, et al. Development of SNP marker panels for genotyping by target sequencing (GBTS) and its application in soybean. Mol Breed. (2023) 43:26–26. doi: 10.1007/s11032-023-01372-6

PubMed Abstract | Crossref Full Text | Google Scholar

25. Xu Y, Crouch JH. Marker-assisted selection in plant breeding: From publications to practice. Crop Sci. (2008) 48:391–407. doi: 10.2135/cropsci2007.04.0191

Crossref Full Text | Google Scholar

26. Huang Y, Hickey JM, Cleveland MA, Maltecca C. Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genet Select Evol. (2012) 44:25. doi: 10.1186/1297-9686-44-25

PubMed Abstract | Crossref Full Text | Google Scholar

27. Dassonneville R, Fritz S, Ducrocq V, Boichard D. Imputation performances of 3 low-density marker panels in beef and dairy cattle. J Dairy Sci. (2012) 95:4136–40. doi: 10.3168/jds.2011-5133

Crossref Full Text | Google Scholar

28. Carvalheiro R, Boison SA, Neves HH, Sargolzaei M, Schenkel FS, Utsunomiya YT, et al. Accuracy of genotype imputation in Nelore cattle. Genet Select Evol. (2014) 46:69. doi: 10.1186/s12711-014-0069-1

PubMed Abstract | Crossref Full Text | Google Scholar

29. Conny T, Peter R, Muchtaruddin M, Zakiudin M, Herawati S, Suzanna I, et al. Study protocol to investigate the environmental and genetic aetiology of atopic dermatitis: the Indonesian Prospective Study of Atopic Dermatitis in Infants (ISADI). BMJ Open. (2017) 7:e012475. doi: 10.1136/bmjopen-2016-012475

PubMed Abstract | Crossref Full Text | Google Scholar

30. Wang Z, Park HG, Wang DH, Kitano R, Kothapalli KS, Brenna JT. Fatty acid desaturase 2 (FADS2) but not FADS1 desaturates branched chain and odd chain saturated fatty acids. Biochimica et Biophysica Acta. (2020) 1865:158572. doi: 10.1016/j.bbalip.2019.158572

PubMed Abstract | Crossref Full Text | Google Scholar

31. Paola M, Marco G, Caterina G, Stefano D, Luigi B, Brandusa PA, et al. Proteomic analysis from skin swabs reveals a new set of proteins identifying skin impairment in atopic dermatitis. Exp Dermatol. (2021) 30:811–9. doi: 10.1111/exd.14276

PubMed Abstract | Crossref Full Text | Google Scholar

32. Xiaolu H, Xiao L, Yiwen Z, Haizhou L, Hengyu D, Yinjun S, et al. CDH1 is identified as a therapeutic target for skin regeneration after mechanical loading. Int J Biol Sci. (2021) 17:353–67. doi: 10.7150/ijbs.51309

PubMed Abstract | Crossref Full Text | Google Scholar

33. Collins CA, Watt FM. Dynamic regulation of retinoic acid-binding proteins in developing, adult and neoplastic skin reveals roles for β-catenin and Notch signalling. Dev Biol. (2008) 324:55–67. doi: 10.1016/j.ydbio.2008.08.034

Crossref Full Text | Google Scholar

34. Haobin H, Xiaoliang W, Caiyun Z, Yingying T, Wenwei L, Xia C, et al. Genomic analysis of GBS data reveals genes associated with facial pigmentation in Xinyang blue-shelled layers. Arch Animal Breed. (2020) 63:483–91. doi: 10.5194/aab-63-483-2020

PubMed Abstract | Crossref Full Text | Google Scholar

35. Kitago M, Momozawa Y, Masuda K, Wakabayashi Y, Date-Ito A, Hagino-Yamagishi K, et al. Localization of the candidate genes ELOVL5 and SCD1 formale effect'pheromone synthesis in goats (Capra hircus). Journal of Reproduction and Development. (2007) 53:1329–33. doi: 10.1262/jrd.19060

Crossref Full Text | Google Scholar

Keywords: fine wool sheep, targeted sequencing, genotyping, liquid chip, genome-wide association analysis

Citation: Guo T, Yuan C, Liu J and Yang B (2025) Design, validation, and application of a 1K liquid chip for genome-wide association analysis in Alpine Merino sheep. Front. Vet. Sci. 12:1690580. doi: 10.3389/fvets.2025.1690580

Received: 22 August 2025; Revised: 05 November 2025;
Accepted: 10 November 2025; Published: 18 December 2025.

Edited by:

Xuexue Liu, UMR5288 Anthropologie Moleculaire et Imagerie de Synthese (AMIS), France

Reviewed by:

Xiaoyun He, Chinese Academy of Agricultural Sciences, China
Zhuqing Zheng, Huazhong Agricultural University, China

Copyright © 2025 Guo, Yuan, Liu and Yang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tingting Guo, Z3VvdGluZ3RpbmdAY2Fhcy5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.