Genome-wide association study to identify SNPs and candidate genes associated with body size traits in donkeys

The Yangyuan donkey is a domestic animal breed mainly distributed in the northwest region of Hebei Province. Donkey body shape is the most direct production index, can fully reflect the donkey’s growth status, and is closely related to important economic traits. As one of the main breeding selection criteria, body size traits have been widely used to monitor animal growth and evaluate the selection response. Molecular markers genetically linked to body size traits have the potential to accelerate the breeding process of animals via marker-assisted selection. However, the molecular markers of body size in Yangyuan donkeys have yet to be explored. In this study, we performed a genome-wide association study to identify the genomic variations associated with body size traits in a population of 120 Yangyuan donkeys. We screened 16 single nucleotide polymorphisms that were significantly associated with body size traits. Some genes distributed around these significant SNPs were considered candidates for body size traits, including SMPD4, RPS6KA6, LPAR4, GLP2R, BRWD3, MAGT1, ZDHHC15, and CYSLTR1. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses indicated that these genes were mainly involved in the P13K-Akt signaling pathway, Rap1 signaling pathway, regulation of actin cytoskeleton, calcium signaling pathway, phospholipase D signaling pathway, and neuroactive ligand-receptor interactions. Collectively, our study reported on a list of novel markers and candidate genes associated with body size traits in donkeys, providing useful information for functional gene studies and offering great potential for accelerating Yangyuan donkey breeding.


Introduction
Domestic donkeys (Equus asinus) have facilitated the movement of goods and people for millennia (Todd et al., 2022), providing a significant contribution to the development of human societies (Seyiti and Kelimu, 2021). With more than 4,000 years of donkey breeding history, China has one of the most abundant donkey genetic resources in the world (Wang et al., 2022). Yangyuan donkeys are mainly distributed in the northwestern part of Hebei Province of China, and are a medium-sized meat-and-servitude breed that was developed through long-term terroir domestication and generations of selective breeding by local farmers. It has been listed as one of 24 locally protected donkey breeds. Donkey populations have declined significantly over the past decade worldwide (Seyiti and Kelimu, 2021), and many local donkey breeds with excellent performance are currently threatened with extinction (Zeng et al., 2019), including the Yangyuan donkeys. Considering marginal livestock production worldwide (Maestrini et al., 2020), donkey products have become scarce and expensive. Therefore, breeding high-yielding varieties to improve the production performance of meat , milk (Souroullas et al., 2018), and hides (Bennett and Pfuderer, 2020) with high economic and nutritional value may be an effective way to promote the sustainable development of the donkey industry.
As a complementary tool for conventional breeding, molecular marker-assisted selection has accelerated the breeding process of various domestic animals (Christensen and Lund, 2010;Ding et al., 2017). The increasing use of highthroughput sequencing technology has made single nucleotide polymorphisms (SNPs) highly attractive as molecular markers, which have been widely used in the field of animal breeding (Choi et al., 2017;Tsai et al., 2020). Genome-wide association studies (GWASs) are currently used to examine SNPs to determine genetic factors that affect quantitative traits (Wang W. et al., 2016). In addition, GWASs can simultaneously test millions of SNPs in comparatively wide chromosomal regions by comparing the frequencies of genetic variants in phenotypically different individuals and can estimate whether the locus is associated with the target trait (Dai et al., 2013;Gorlova et al., 2022). Due to advances in next-generation sequencing (NGS), GWAS has been carried out extensively on various livestock, such as pigs (Wu et al., 2019), cattle (Pausch et al., 2017), and sheep . However, little in-depth breeding research has been conducted on donkeys due to their undervalued status. Previous studies have started uncovering the genetic basis of body size variations in donkeys, revealing a limited number of polymorphic sites of candidate genes, including IGF1 (Lai et al., 2021), CYP4A11 (Wang et al., 2022), and TBX3 ; however, a large amount of genomic information is waiting to be explored.
In this study, we detected significant SNP markers and candidate genes associated with body size traits within a population of 120 Yangyuan jennies using GWAS. GO and the KEGG pathway analyses were conducted. Our results may serve as a reference for molecular marker-assisted breeding in donkeys.

Sample collection and phenotyping
A total of 120 donkeys were selected for whole-genome sequencing in Yangyuan County, Hebei Province, China. All donkeys were adult jennies with unknow pedigree information. The animal feed consisted of grass and hay ad libitum and water. Blood samples were taken from the jugular vein and rapidly stored at −80°C. Genomic DNA was extracted from the blood using a TIANamp genomic DNA kit (TIANGEN, DP304, China). The quality and integrity of the extracted DNA were examined based on the A260/A280 ratio and 1% agarose gel electrophoresis. We also conducted measurements of the body height, body length, chest circumference, and shin girth. EXCEL 2019 was used to generate the descriptive statistics of the four body size traits.

Sequencing and quality control
Genomic libraries with an insert size of~400 bp were constructed and sequenced using an Illumina NovaSeq instrument (2 × 150 paired-end mode), which was conducted by the Personalbio Biotechnology Co., Ltd. (Shanghai, China). FastQC was used to provide quality control checks on the raw sequence data (http://www.bioinformatics.babraham. ac.uk/projects/fastqc). Fastp (Chen et al., 2018) was used to perform data filtering to provide clean data for downstream analysis with the following criteria: 1) removal of the joint contamination at the 3′end; 2) sliding window quality pruning, where if the average Q value ≤ 20, then the bases in the window would be marked as discarded and the window would be moved forward by one base; and 3) if the length of any of the doubleend reads was ≤50 bp, the double-end reads would be removed.

Mapping
High-quality trimmed read pairs were aligned to the reference donkey genome (GenBank: PRJNA431818) by Burrows-Wheeler Aligner (BWA, versin0.7.12-r1039) with default parameters and SAM files were generated (Li and Durbin, 2009). The SAM files were sequenced using Picard1.107 software. Duplicates were removed using the MarkDuplicates command in the Picard package. Thus, if multiple paired reads had the same chromosome coordinates after comparison, only the paired reads with the highest score would be retained, and the reads near InDel were most prone to mapping errors. To minimize SNPs caused by mapping errors, it was necessary to rematch the reads near InDel to improve the accuracy of SNP calling, using the IndelRealigner command in the Genome Analysis Toolkit (GATK) program to rematch all of the reads near InDel to improve the accuracy of SNP prediction. SNP prediction accuracy was improved by using the IndelRealigner command in the GATK program to recompare all of the reads near InDel. For all 120 samples, mapping statistics based on high-quality mapped reads for each accession included the average coverage depth and the proportion of the donkey genome covered by different read depths.

Population structure and phylogenetic analysis
PCA of the 120 samples was conducted with GCTA (Yang et al., 2011) software. To evaluate the phylogenetic relationships between individuals, a maximum likelihood (ML) tree was built based on the SNPs using Fasttree software (Price et al., 2009), with n = 1,000 bootstrap replications. We used PopLDdecay (Zhang C. et al., 2019) software with default parameters to calculate linkage disequilibrium (LD) decay.

GWAS and candidate gene search
GWASs for body size traits, including body height, body length, chest circumference, and shin girth, were performed using linear mixed models (LMMs) by the efficient mixed-model association expedited (EMMAX) software (http://genetics.cs.ucla.edu/emmax.) (Kang et al., 2010). The model is where y is a vector of phenotypic records (body height, body length, chest circumference or shin girth); X represents the vector of the covariates (the PCA principal components obtained from the analysis of population structure), including a column of 1; α represents the vector of the corresponding coefficients, including the intercept; Z represents the vector of the marker genotypes, β represents the effect size of the marker; Wμ represents the vector of the random polygenic effects, with μ~(0, Gσ 2 a ), where σ 2 a is the genetic variance, and G represents the genomic relationship matrix constructed with identity-by-state (IBS); ε is the vector of random residuals with ε~N (0, I σ 2 ), where σ 2 is the residual variance.
The Manhattan and quantile-quantile plot (Q-Q) plots were drawn by R packages. The SNPs reached the genome-wide significant level -log10 (p-value) = 7 were determined as the significant loci. The search for candidate genes was extended from 50 kb upstream and downstream from the significant SNPs.

GO and KEGG pathway enrichment analyses
To provide insight into the functional enrichment of the candidate genes, we performed GO enrichment analysis (Ashburner et al., 2000) and the KEGG pathway (Kanehisa et al., 2004). GO enrichment analysis was performed using InterProScan. A hypergeometric p-value was calculated where the background was set as genes in the entire genome. GO terms with p-value <0.05 were considered significantly enriched, and GO enrichment analysis elucidated the molecular function, biological process, and cellular components. KEGG pathway analysis was performed using KEGG Automatic Annotation Server (KAAS) to predict the main biological pathways involved in the candidate genes.

Descriptive statistics for body size traits
The body height, body length, chest circumference, and shin girth were collected from 120 Yangyuan jennies. The raw measurement data are detailed in Supplementary Table S1. Ranking from high to low according to body height, we divided the population into two groups, the top 60 donkeys are the larger ones, and the bottom 60 donkeys are the smaller ones. The order of the samples in Supplementary Table S1 is the result of ranking according to body height. The general descriptive statistics of the body size traits are shown in Table 1. The average body height was 139.33 cm with a standard deviation of 10.94, the average body length was 138.62 cm with a standard deviation of 10.43, the average chest circumference was 138.62 cm with a standard deviation of 9.68, and the average shin girth was 15.93 cm with a standard deviation of 1.19.

Genome sequencing and SNPs identification
To sequence the Yangyuan donkey genome, libraries with 400 bp inserts were sequenced and 861 Gb of raw sequence data were produced. The quality of the raw data is shown in   Table S2, with Q20 ≥ 96.7, Q30 ≥ 91.48, and GC content from 43.08 to 44.00, which confirmed the high quality of the sequence data. The construction of the library and sequencing were successful. To enhance the quality of the data, Fastp was used to generate high quality data. In total, 718 Gb of clean data were generated. The filtered clean data were aligned to the reference genome, and the mapping rate was 99.57%-99.79% (Supplementary Table S3), implying that the mapping rate could be used for subsequent variant detection. In general, by using next-generation sequencing technology, and after filtering and mapping, a total of 9,735,956 SNPs were identified using the Illumina NovaSeq sequencing platform with an average sequencing depth of~4.1 (Supplementary Table S4). Their distribution on the  Frontiers in Genetics frontiersin.org chromosomes is shown in Figure 1, and the SNP annotation results are shown in Table 2. We found that 65.8% of the SNPs were concentrated in the intergenic region, 31.48% of the SNPs were concentrated in the intronic region, 0.91% of the SNPs were located upstream, 0.8% of the SNPs were concentrated downstream, and only 0.98% of the SNPs were located in the exon region.

Population structure and phylogenetic analysis
We performed principal component analysis (PCA) with 9,735,956 SNPs to analyze the population structure of 120 Yangyuan donkeys. PCA revealed the relationship between the first two principal components, and we found that most of Frontiers in Genetics frontiersin.org the individuals were clustered together and only a few individuals showed outlier distribution, which demonstrated that the genetic background of the population was relatively uniform (Figure 2A). The phylogenetic tree showed that the samples were mixed together without obvious clustering into separate groups, which was consistent with PCA ( Figure 2B).

Genome-wide association study of body size traits
Based on the detected SNPs, correlation analysis was performed according to the SNP and body height, body length, chest circumference, and shin girth traits to identify the markers and candidate genes that were closely associated with the target traits. In total, 16 SNPs reached a significance threshold of 7, and most of the SNPs were within the intronic, intergenic, and downstream locations of chromosomes 8, 13, and 31. The Manhattan plots for the four body size traits and corresponding Q-Q plots of the p-values against the expected p-values are presented in Figure 3. According to the LD analysis results (Supplementary Figure S1), all positional candidate genes were identified for 50 kb before and after the most significant SNP. Detailed information regarding the significant SNPs identified using GWAS and the putative candidate genes are presented in Table 3, including sphingomyelin phosphodiesterase 4 (SMPD4), ribosomal protein S6 kinase A6 (RPS6KA6), lysophosphatidic acid receptor 4 (LPAR4), glucagon-like peptide 2 receptor (GLP2R), bromodomain, and the WD repeat domain containing 3 (BRWD3), magnesium transporter 1 (MAGT1), zinc finger DHHC-type palmitoyltransferase 15 (ZDHHC15), and cysteinyl leukotriene receptor 1 (CYSLTR1). Other genes located in the 50-kb genome region flanking the significantly associated SNPs are shown in Supplementary Table S5.

Bioinformatic analysis of the candidate genes
The GO function annotation and KEGG pathway enrichment analyses were utilized to explore the biological functions of the candidate genes. The GO enrichment analysis results were classified based on the molecular function, biological process, and cellular components. The top 10 GO terms with the smallest p-value, indicating the most significant enrichment, were selected for display, and the results are shown in Figure 4. According to the KEGG enrichment results, the KEGG pathways with the smallest FDR values, which were the most enriched, are illustrated in Figure 5.

Discussion
The Yangyuan donkey is considered a valued Chinese indigenous breed, famous for its high disease resistance performance and strong adaptability. The donkey industry has been vigorously developed in many areas of northern China; however, the genetic mechanisms of body size traits of these donkeys have rarely been studied. Due to the long average generation interval (5 years) of donkeys, traditional approaches to donkey breeding yield slow progress. Undoubtedly, compared to conventional breeding, marker-assisted selection (MAS) could accelerate breeding programs (Ding et al., 2017). Molecular markers such as SSRs and SNPs are abundant in animal genomes (Campos Mantello et al., 2019). In the breeding of animals, body size trait is an important index for evaluating the growth status (Asiamah et al., 2020), and earlier studies have found a strong correlation between body size and meat production in animals (Takasuga, 2016). In the case of donkey breeding, it was essential to identify molecular markers associated with body size traits.
With the development of molecular markers, the GWAS approach has been used extensively to understand the genetic basis of complex traits in animals (Tam et al., 2019), and is widely considered a powerful approach for exploring marker genes for complex traits at the SNP level . In this research, we focused on the differences among body size traits in Yangyuan donkeys, and we noticed significant body size differences, even within the same population. Prior to the GWAS, we performed a principal component analysis. The results of PCA showed the outlier phenomenon of some smallsized donkeys, which might due to the inclusion of several hybrid Frontiers in Genetics frontiersin.org donkeys with similar phenotypic characteristics to Yangyuan donkeys. We eliminated this stratification by using the PCA value of each sample as covariance in GWAS. By performing GWAS in 120 Yangyuan donkeys using SNPs detected via wholegenome sequencing, we reported some significantly correlated SNPs and candidate genes that could serve as potential candidates associated with body size traits. By annotating candidate regions, sphingomyelin phosphodiesterase 4 (SMPD4) was found to be a candidate gene greatly associated with body height and body length. SMPD4 is a sphingomyelinase from lipid rafts that hydrolyzes sphingomyelin into phosphorylcholine and ceramide, and regulates membrane composition in lipid rafts (Wu et al., 2010). The loss of function of SMPD4 will induce endoplasmic reticulum (ER) stress, autophagy, impaired sphingolipids homeostasis, cell cycle dysregulation, and this could partially explain the severe growth failure and microcephaly in SMPD4-related disorder (Magini et al., 2019;Bijarnia-Mahay et al., 2022). SMPD4 causes several developmental abnormalities through the sphingolipid-related pathway, serving as a hint for our subsequent variation study of how it may affect body size traits in donkeys. Magnesium transporter 1 (MagT1) was considered a candidate related to chest circumference. MagT1 contributes to the maintenance of Mg homeostasis at the cellular level, playing a crucial role in bone metabolism and in the regulation of bone cell functions (Castiglioni et al., 2018). A key event in bone formation is the differentiation of MSC into osteoblasts, a process that involves Mg and its transport (Castiglioni et al., 2018). Research showed that regulating the expression of MagT1 was closely related to the osteogenic differentiation of BMSCs, and MagT1 was upregulated in hMSC differentiation (Lin et al., 2019). In rat MSC, silencing MagT1 blunted osteogenic differentiation (Zheng et al., 2016). However, little is known about the role of MagT1 in bone, and further experiments are needed to verify whether SNPs in Frontiers in Genetics frontiersin.org MagT1 will affect its expression and whether MagT1 causes differences in donkey body size by affecting bone formation. GO analysis showed that in terms of cellular components, the candidates focused on the components of chromosome remodeling complex, lysine methyltransferase, and membrane. Molecular function enrichment analysis highlighted the membrane protein activities, and biological processes were mainly enriched in response to certain organic substances and ion channel activity. According to KEGG analysis, pathways were mainly enriched on the P13K-Akt signaling pathway, the Rap1 signaling pathway, the regulation of actin cytoskeleton, calcium signaling pathway, phospholipase D signaling pathway, and neuroactive ligand-receptor interactions. Previous studies have shown that by regulating the P13K-Akt signaling pathway, the proliferation and differentiation of bone marrow mesenchymal stem cells and bone formation could be promoted (Ledergerber et al., 1985;Zhang et al., 2017). Blocking the PI3K/AKT signaling pathway was found to not only impair chondrocyte differentiation but also inhibit longitudinal bone growth (Ulici et al., 2008). The calcium signaling pathway in bone has been extensively studied, as calcium is a necessary factor for bone cell proliferation and differentiation. Based on the above analysis, we hypothesized that SNPs caused differences in the body size traits of Yangyuan donkeys by affecting the pathways related to bone growth and metabolism, and the enrichment analysis results provided guidance for our subsequent functional studies.
In summary, from the GWAS results for the body size traits with SNPs detected from the whole-genome sequence in 120 Yangyuan donkeys, we identified 16 SNPs significantly associated with body size traits. All of the results provided good ideas for further revealing the Yangyuan donkey body size trait mechanisms, improving donkey production performance, and conducting marker assisted selection in donkey breeding.

Data availability statement
The data presented in the study are deposited in the NCBI repository, accession number PRJNA898246.

Ethics statement
The animal study was reviewed and approved by Institutional Review Board (or Ethics Committee) of China Agricultural University Laboratory Animal Welfare and Animal Experimental Ethical Inspection Committee (protocol code: AW-311022-2-1, date of approval: 1 August 2021). Animal Experiment Guidelines and Regulations of China Agricultural University (Permit No. SCXK: 2012-0766, China). Written informed consent was obtained from the owners for the participation of their animals in this study.

Author contributions
SZ, SS and SW conceived and planned the study. SS and SW wrote the manuscript. SS, SW, NL, SC, SD, YG, XW and YC performed the sample collection and phenotyping. All authors reviewed the manuscript. The author(s) read and approved the final manuscript.

Funding
The work was supported by Inner Mongolia central guidance local scientific research project fund, Hebei donkey industry innovation fund.