Population Genetic Polymorphism of Skeletal Muscle Strength Related Genes in Five Ethnic Minorities in North China

Musculoskeletal performance is a complex trait influenced by environmental and genetic factors, and it has different manifestations in different populations. Heilongjiang province, located in northern China, is a multi-ethnic region with human cultures dating back to the Paleolithic Age. The Daur, Hezhen, Ewenki, Mongolian and Manchu ethnic groups in Heilongjiang province may have strong physical fitness to a certain extent. Based on the genetic characteristics of significant correlation between some important genes and skeletal muscle function, this study selected 23 SNPs of skeletal muscle strength-related genes and analyzed the distribution of these loci and genetic diversity in the five ethnic groups. Use Haploview (version 4.1) software to calculate the chi-square and the Hardy-Weinberg equilibrium to assess the difference between the two ethnic groups. Use R (version 4.0.2) software to perform principal component analysis of different ethnic groups. Use MEGA (version 7.0) software to construct the phylogenetic tree of different ethnic groups. Use POPGENE (version 1.32) software to calculate the heterozygosity and the FST values of 23 SNPs. Use Arlequin (version 3.5.2.2) software to analyze molecular variance (AMOVA) among 31 populations. The results showed that there was haplotype diversity of VDR, angiotensin-converting enzyme, ACTN3, EPO and IGF1 genes in the five ethnic groups, and there were genetic differences in the distribution of these genes in the five ethnic groups. Among them, the average gene heterozygosity (AVE_HET) of the 23 SNPs in the five populations was 0.398. The FST values of the 23 SNPs among the five ethnic groups varied from 0.0011 to 0.0137. According to the principal component analysis, the genetic distance of Daur, Mongolian and Ewenki is relatively close. According to the phylogenetic tree, the five ethnic groups are clustered together with the Asian population. These data will enrich existing genetic information of ethnic minorities.


INTRODUCTION
Skeletal muscle is one of the most dynamic and plastic tissues of the human body, and it is an important part of the human body. The skeletal muscles are involved in various functions of human life. From a mechanical perspective, the main function of skeletal muscle is to convert the body's chemical energy into mechanical energy, so that the body can generate force and strength, and then generate movement to maintain or benefit human health. From a metabolic perspective, the roles of skeletal muscle include promoting basal energy metabolism, storing important substrates such as amino acids and carbohydrates, and providing most of the oxygen and energy for human movement (Frontera and Ochala, 2015).
With the development of exercise physiology, studies have found that acquired physical training has an important and positive effect on the improvement of human muscle mass, strength and function (Phu et al., 2015). In addition, genetic differences can influence the ability of the body's skeletal muscles to produce and use energy during exercise (Yan et al., 2016). Studies have highlighted a significant correlation between potentially important genes and musculoskeletal function. For example, the VDR gene may have a positive effect on skeletal muscle (Książek et al., 2019), and the IGF1 gene increases muscle mass and improves skeletal muscle regeneration (Vassilakos and Barton, 2018), and the EPO gene promotes differentiation and survival of myoblasts (Lamon and Russell, 2013). In addition, other genes involved in skeletal muscle strength include the endurance gene ACE (Ahmetov and Fedotovskaya, 2015), and the strength-related genes, such as ACTN3 (Ahmetov and Fedotovskaya, 2015;Seto et al., 2021), AGT (Pickering et al., 2019), PPARG (Ahmetov and Fedotovskaya, 2015;Norouzi et al., 2019) and IL6 (Pickering et al., 2019). Due to environmental and genetic factors, there are different manifestations in different ethnic groups (Pitsiladis et al., 2016). For instance, the frequencies of the three ACE genotypes (II, ID, DD) were 25, 50, and 25%, respectively, in Caucasian populations (Jones et al., 2002), which were not significantly different from those of Asian populations in Korea (23,66,and 11%,respectively) (Oh, 2007). Other studies have found that the ID genotype is significantly associated with outstanding endurance quality in both European and African American populations (Weyerstraß et al., 2018). The A allele of rs699 locus of AGT gene was significantly correlated with Brazilian endurance quality (Guilherme et al., 2018). CT genotype of ACTN3 gene was markedly correlated with explosive power of Caucasian. CC genotype was substantially correlated with Asian explosive power. The T allele or TT genotype was significantly correlated with the explosive power of both Caucasian and Asian male populations, and the TT genotype also significantly affected the explosive power of Russian athletes (Weyerstraß et al., 2018). CC genotype of AGT gene has a high performance in Polish power athletes, with a genotype frequency of 40% (Zarębska et al., 2013). The C allele of IL6 was positively associated with athletic ability in Israelis of Ethiopian descent, which not only improved speed but also improved training recovery (Ben-Zaken et al., 2021). China is a multi-ethnic country, consisting of the Han nationality and 55 ethnic minorities, of which the population of 55 ethnic minorities accounts for about 8% of the total population. To a certain extent, it provides abundant genetic resources for the study of genes related to skeletal muscle strength. Heilongjiang province, located in northern China, is a multi-ethnic region with human culture since the Paleolithic Age. To some extent, the Daur, Mongolian, Ewenki, Manchu and Hezhen belong to the Altaic language family in Heilongjiang may have stronger physical fitness. According to reports, the grip strength of Mongolian, Daur and Ewenki adults is significantly higher than the national level (Dong et al., 2004). In addition, some scholars believed that some indexes of physical characteristics in Hezhen people are slightly higher than those of Han people due to engaged in fishing and hunting activities for a long time (Chen et al., 1999;Wang et al., 2014). Some scholars sorted out and counted the relevant materials of 263 Manchu college students aged 19 to 22, and found that the physical fitness of Manchu college students was significantly better than that of Han (Bi, 1993).
Single nucleotide polymorphisms (SNPs) refer to DNA sequence polymorphisms caused by a single nucleotide variation at the genome level, with a frequency generally greater than 1% in the population. SNP is closely related to the genetic traits of populations and can be used as genetic markers for the genetic structure of different populations (Galinsky et al., 2019). Based on the genetic characteristics of significant correlation between some important genes and skeletal muscle function, this study intends to select 23 SNPs in AGT (rs699, rs4762, rs5051, rs5050), PPARG rs3856806, IL6 rs2066992, ACE (rs4309, rs4331, rs4341, rs4343, rs4362), ACTN3 (rs1815739, rs540874), EPO (rs1617640, rs551238), IGF1 (rs5742714, rs1520220, rs5742612, rs972936), VDR (rs7975232, rs757343, rs2228570, rs11568820) genes. We analyzed the allele frequency of these loci in Daur, Hezhen, Ewenki, Mongolian and Manchu, and compared with the 26 populations from 1,000 genome project, to investigate the genetic polymorphism of skeletal muscle strength related genes in the five ethnic groups and to provide theoretical support for explaining the genetic polymorphism of skeletal muscle strength related genes between different populations.

Study Populations
Blood samples were collected from 882 unrelated individuals (413 males, 469 females, 45 average age) belonging to five Chinese ethnic minorities in Heilongjiang province at least three generations. These individuals include 233 Daur individuals, 106 Mongolian individuals, 73 Ewenki individuals, 220 Manchu individuals, and 250 Hezhen individuals. The geographical distribution on the map is shown in Figure 1. The study was carried out in strict accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Harbin Medical University. All the participants signed a written informed consent form.

DNA Extraction and Genotyping
Genomic DNA was extracted from 200 μl blood using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Genotyping was performed using the SNPscan ™ Kit (Genesky Biotechnologies Inc., Shanghai, China) according to the manufacturer's instructions.

Database Data
The genotype and allele frequency data of individuals from the 26 populations in the world were downloaded from the ensemble database at http://grch37.ensembl.org/Homo_sapiens/Tools/ DataSlicer The abbreviations and full names of the 26 populations in the world were downloaded from the https:// www.ncbi.nlm.nih.gov/variation/tools/1000genomes.

Statistical Analysis
Chi-square and Hardy-Weinberg equilibrium were calculated to assess the differences between two populations using the Haploview software, the linkage disequilibrium and the haplotype analysis of SNPs also were performed by it (Barrett et al., 2005). In the haplotype analysis the r 2 threshold was 0.8. Phylogenetic tree was generated using the UPGMA dendrogram method in MEGA7 (Kumar et al., 2016). The parameter such as AVE_HET, FST, Nm and the Nei's genetic distance based on UPGMA of the five ethnic groups were calculated using the POPGENE software (Yeh et al., 1997). Principal component analysis (PCA) were carried out in the R packages "factoextra" and "ggplot2" (Luu et al., 2017;Singh and Soman, 2019). Analysis of molecular variance (AMOVA) was calculated by Arlequin (Excoffier et al., 2007).

Genotyping Data and Hardy-Weinberg Test
The genotype distribution in the study is summarized in Table 1. The 23 SNPs included in the study were all in line with Hardy-Weinberg equilibrium (p > 0.05). The minimum allele frequencies and genotype frequencies of 23 SNPs in five populations are summarized in Table 2 and Supplementary Table S1 respectively.

The Frequencies of the Polymorphisms Among Different Populations
The SNPs with statistical differences in the comparison between the two ethnic groups are summarized in The average gene heterozygosity (AVE_HET) of the 23 SNPs in the five populations was 0.398 ( Table 4). The average observed heterozygosity (OBS_HET) was 0.3957. The observed heterozygosity of rs1815739 and rs540874 in five populations was relatively large. The observed heterozygosity of rs4762 was the lowest. The F ST values of the 23 SNPs among the five populations varied from 0.0009 to 0.0137, with an average of 0.0049, that is, 0.49% genetic variation existed between populations and 99.51% genetic variation existed within populations ( Table 5). The gene flow of rs3856806 and rs1815739 was relatively large, and the mean value of Nm was 50.6913.

Haplotype Analysis
There were five blocks in 23 SNPs, the r 2 threshold of haplotype blocks were 0.8 Five blocks were distributed in VDR, ACE, ACTN3, EPO and IGF1 genes ( Table 6; Figure 2). The five blocks with statistical differences were mainly concentrated in VDR and ACE genes. The results showed that there were differences in haplotype distribution among the five ethnic groups. A block1 containing two SNPs was constructed in the VDR gene. The most common haplotype was CC, followed by AT and AC. The frequency distribution of CC was statistically significant between Daur       Table S2). According to the Nei's genetic distance of the five ethnic groups based on UPGMA method (Supplementary Table S3). The genetic distance between Daur and Mongolian was relatively close (0.0035); the genetic distance between Mongolian and Ewenki was the closest (0.0007); The genetic distance between Daur and Ewenki was relatively close (0.0036). According to the PCA plot of the Asia populations (Figure 3), PC1 and PC2 accounted for 37.5 and 28.1% of the total genetic variation, respectively. The genetic distance between Daur, Ewenki and Mongolian were relatively close, which was consistent with the result of the FST value and the Nei's Genetic Distance between the five ethnic groups (Supplementary TableS2,3). According to the PCA plot of the world populations ( Figure 4A), PC1 and PC2 accounted for 51.7 and 32.7% of the total genetic variation, respectively. PCA plot divided the 31 world populations into five groups, namely AFR, AMR, EAS, EUR, and SAS, which named according to their geographic location of African, American, East Asian, European and South Asian. Population belonging to the same large group are generally clustered together, which are consistent with results from the phylogenetic tree of the world populations ( Figure 4B). We found that the five ethnic groups included in the study were clustered in one cluster with the Asian population. In addition, the mean F ST values and the mean Nm values of the 23 SNPs among the 31 populations was 0.098, 2.3017, respectively (Supplementary Table S4). According to the analysis of  Table S5).

DISCUSSION
Different nations have formed specific genetic structures of different cultures, phenotypes and languages under the natural selection of different environments, material conditions and various pathogens. In East Asia, China has the largest Han population in the world, with 55 officially recognized ethnic groups making up their specific cultural backgrounds. They speak more than nine language families in China (Chen et al., 2019). Among them, five ethnic groups belonging to the Altaic language family in Heilongjiang province in northern China may have stronger physical fitness (Bi, 1993), the performance of the basic ability of human muscle activity. Some studies have found that there is a significant association between genotype and skeletal muscle phenotype. For example, the presence of SNPs is associated with better skeletal muscle strength performance (Khanal et al., 2020). We selected the 23 SNPs included in this study were all focused on genes related to skeletal strength, to further study the genetic composition and phylogeny of the five ethnic groups. 23 SNPs are consistent with the Hardy Weinberg equilibrium. In addition, in the pair comparison of the five populations studied, the genetic differences were mainly found on genes IL6, VDR, AGT, ACE and IGF1. for example, AGT encodes angiotensinogen, a protein involved in the renin-angiotensin-aldosterone system (RAAS) and is related to muscle growth . IGF1 is an important regulator not only of muscle mass and function, but also of bone. This is true not only during development, but throughout the human life cycle (Moriwaki et al., 2019). Vitamin D levels are closely related to the presence of vitamin D receptors in most human exoskeletal cells, and exposure to vitamin D in skeletal muscle leads to the expression of multiple myogenic transcription factors that promote the proliferation and differentiation of muscle cells (Wiciński et al., 2019). The angiotensin-converting enzyme (ACE) gene is associated with superior muscle metabolic performance and muscle endurance (Vaughan et al., 2013). Erythropoietin plays an important role in regulating metabolic homeostasis and bone remodeling (Suresh et al., 2019). Interleukin-6 (IL-6), the prototype of muscle factor, was identified as a muscle-derived cytokine 15 years ago (Karstoft and Pedersen, 2016). F ST plays a core role in population and evolutionary genetics, it can reflect the degree of genetic differentiation between populations (Meirmans and Hedrick, 2011). The F ST values of the 23 SNPs among the five ethnic groups varied from 0.0009 to 0.0137. There is almost no genetic differentiation in each population. According to the mean value of Nm, indicating that genetic differentiation did not occur between populations, but was mainly caused by genetic differentiation within populations, this is consistent with the population genetic differentiation results shown by the F ST value of this study. We found that there was little difference in genetic distance between the five populations studied on the whole, this may because the five ethnic groups are all located in Heilongjiang province. which is consistent with the geographical location of the population (Tian et al., 2008). In addition, a total of five blocks exist in 23 SNPs (Figure 2). We concluded that rs7975232 and rs1815739 were statistically different in the five ethnic groups based on the F ST values ( Table 5). The same gene can perform different functions in the body, we found that the rs7975232 of VDR gene was related to the obesity and diabetes, it is also as the genetic makers of them. rs7975232 polymorphism of VDR gene was found to be positively correlated with obesity according to skin fold thickness and body fat rate in Chinese Han population (Shen et al., 2019). Another recent study also found that rs797523 polymorphism appears to be associated with overweight/obesity (Wang et al., 2021). The five ethnic groups in Heilongjiang province may be at higher risk of obesity or overweight due to environmental、eating habits and genetic factors because they are located in the extremely cold area of northern China. As we known, obesity is an important risk factor for diabetes. Meanwhile, it may reveal that the ethnic groups in the extremely cold area of northern China may be susceptible to diabetes to a certain extent (Li et al., 2020). Interestingly, another locus of significant genetic variation explains exactly how extreme cold affects skeletal muscle in humans. The positive selection of the allele of rs1815739 in cold climates provides the mechanism, that is, the slower type of I MyHC in the α-actinin-3 muscle, combined with a shift in neuronal muscle activation to increase muscle tone rather than obvious tremor, supporting the key thermogenesis of human skeletal muscle during cold exposure (Wyckelsma et al., 2021). Therefore, we believe that the genetic difference of rs1815739 and rs7975232 among the five ethnic groups may be caused by the fact that the five ethnic groups are located in Heilongjiang province, a high-latitude and severe cold region in China. The largest component of genotypic variation is the reduction of high-order data (all genotypes) to low-order variation. According to the PCA results of the Asian population, the genetic distances of Daur, Ewenki, and Mongolian were relatively close, indicates that in the mixing process of history and modern times. There may be gene exchange between Daur and Mongolian and Ewenki to some extent (Liu et al., 2007;Gao et al., 2006), which was consistent with the result of the Nei's genetic distance and F ST values between the five ethnic groups. There are also studies showing that from the perspective of linguistic kinship, immigration history and origin, the kinship between the Mongolian and the Daur is very close, which indicates that the two groups in each pair may be of the same origin (Hou et al., 2007). According to the world population phylogeny tree and PCA, the genetic distance between the five populations and the Asian population is relatively close, and they are clustered with the Asian population. The genetic variation of 31 populations occurred mainly within the population (Supplementary Table S4,5).
In Conclusion, geographical and linguistic divisions have shaped the genetic structure of modern populations. Cluster analysis shows that the five ethnic groups in Heilongjiang province are clustered together with the East Asian ethnic groups. The genetic distance between Daur, Mongolian and Ewenki is closer, in order to better study the genetic characteristics of skeletal muscle strength related genes in different population, in addition to the more national, cultural, geographical and linguistic diversity group, also need more genome data combining with archaeological data and population history for further analysis and validation.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

ETHICS STATEMENT
The study involving human participants were reviewed and approved by Harbin Medical University. The participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
SF, QL and BD came up with the idea for this study. JB and SF perform or supervise laboratory work. BD, QL, TZ and MJ conducted experiments. BD, QL, TZ, MJ, XL and YF analysis data. BD wrote the manuscript.