Prediction and Identification of Power Performance Using Polygenic Models of Three Single-Nucleotide Polymorphisms in Chinese Elite Athletes

Objective: The manuscript aims to explore the relationship between power performance and SNPs of Chinese elite athletes and to create polygenic models. Methods: One hundred three Chinese elite athletes were divided into the power group (n = 60) and endurance group (n = 43) by their sports event. Best standing long jump (SLJ) and standing vertical jump (SVJ) were collected. Twenty SNPs were genotyped by SNaPshot. Genotype distribution and allele frequency were compared between groups. Additional genotype data of 125 Chinese elite athletes were used to verify the screened SNPs. Predictive and identifying models were established by multivariate logistic regression analysis. Results: ACTN3 (rs1815739), ADRB3 (rs4994), CNTFR (rs2070802), and PPARGC1A (rs8192678) were significantly different in genotype distribution or allele frequency between groups (p < 0.05). The predictive model consisted of ACTN3 (rs1815739), ADRB3 (rs4994), and PPARGC1A (rs8192678), the area under curve (AUC) of which was 0.736. The identifying model consisted of body mass index (BMI), standing vertical jump (SVJ), ACTN3, ADRB3, and PPARGC1A, the area under curve (AUC) of which was 0.854. Based on the two models, nomograms were created to visualize the results. Conclusion: Two models can be used for talent identification in Chinese athletes, among which the predictive model can be used in adolescent athletes to predict development potential of power performance and the identifying one can be used in elite athletes to evaluate power athletic status. These can be applied quickly and visually by using nomograms. When the score is more than the 130 or 148 cutoff, it suggests that the athlete has a good development potential or a high level for power performance.


INTRODUCTION
The physical performance and athletic capacity of elite athletes, such as endurance, power, speed, flexibility, and sensitivity, are influenced by many factors, among which genetic factors are important (Puthucheary et al., 2011;Ahmetov et al., 2016;Peplonska et al., 2017). It is a major task and research direction to use genetic factors to predict the development potential of physical performance and athletic capacity in adolescent athletes and to evaluate and identify the level of elite athletes (Breitbach et al., 2014;Webborn et al., 2015;Pickering et al., 2019).
In physical performance and athletic capacity, power performance has a relatively high heritability (Maciejewska-Skrendo et al., 2019), which is particularly important for the power-orient sports and critical in talent identification (Yang et al., 2017). It has been reported that the estimated heritability of muscle strength and mass varies from approximately 30-80% with large differences of muscle groups, contraction velocities, and muscle lengths (Peeters et al., 2009). Therefore, these genetic variants may contribute to elite athletic performance such as power. Using these genetic variants, it is possible to identify future elite athletes. Single nucleotide polymorphism (SNP) is a form of genetic variants, which is often used to study and investigate the genetic factors of physical performance (Sarzynski et al., 2016). SNP mainly refers to the DNA sequence polymorphism caused by single nucleotide variation at the genomic level. It is one of the most common human heritable variations, accounting for more than 90% of all known polymorphisms. SNPs exist widely in the human genome, with an average of one in every 300 base pairs. It is estimated that the total number of SNPs can reach 3 million or more. SNP can result from the transition or transversion of a single base, or from the insertion or deletion of a base (Brookes, 1999). For example, a SNP of R577X (rs1815739) in the ACTN3 gene modifies the attainment of elite power-oriented athletic performance status (Yang et al., 2017;Tharabenjasin et al., 2019). In this SNP, common C-to-T base substitution results in the transformation of an arginine base (R) to a premature stop codon (X). X allele homozygotes are deficient in the protein encoded by ACTN3, α-actinin-3, which is expressed exclusively in fast-twitch muscle fibers. As a result, XX genotypes tend to have lower proportions of fast-twitch muscle fibers (Vincent et al., 2007), and decreased power performance, because fast-twitch muscle fiber is an important component of power-oriented performance (Pickering et al., 2019). In fact, physical performance may be influenced by hundreds of related SNPs (Ben-Zaken et al., 2015;Grealy et al., 2015). Previous polygenic studies did not differentiate the role of SNPs in prediction and identification; it is therefore hard to be applied in talent identification (Ruiz et al., 2010;.
Based on the above, we recruited Chinese elite athletes to explore the relationship between polygenic profiles and elite power performance. We then attempted to establish the predictive and identifying models, based on which nomograms were also created for talent identification in adolescent and elite athletes.

Ethics Approval
This study was conducted on the basis of the Declaration of Helsinki and approved by the ethics committee of the School of Life Sciences Fudan University (Shanghai, China). Before the study, written informed consent was obtained from all participants.

Standing Long Jump and Standing Vertical Jump Data Collection
The athletes' best standing long jump (SLJ) and standing vertical jump (SVJ) results were collected retrospectively in the Oriental Land Training Base (Shanghai, China). All tests were conducted with a standardized protocol employed at the training base. Specifically, the SLJ was assessed on the test pad special for standing long jump. Subjects started the test with their toes behind the test line; the distance from the rearmost heel strike to the starting line was used for measurement. Three trials were allowed for each subject to achieve their maximal jump performance. The SVJ results were recorded by a standardized electronic vertical jump tester (Jianmin, Beijing, China). Subjects stood on the test board with the location marked and attempted to reach the maximum height vertically, with a landing point within 10 cm from the starting point. The examiner recorded the height displayed from the electronic screen, and the highest two of the three jumps were used. Before all testing, subjects completed a supervised warm-up of running and dynamic stretching, and more specific, submaximal jump warm-up protocol (Ahmetov et al., 2016).

Selection of Genetic Polymorphisms
Some reviews revealed that more than 40 SNPs (Ahmetov and Fedotovskaya, 2015) and 69 SNPs (Naureen et al., 2020) are associated with power performance. These reports of SNPs are generally in European ancestry populations and fewer in the Chinese Han population. On the basis of these SNPs, we retained the ones with frequency distribution differences reported in the general population or disease population of Chinese Han nationality and removed the ones without frequency distribution differences or unreported in Chinese Han population as candidate gene markers in this study. Finally, 20 gene polymorphisms (within 17 different genes) were considered to be associated with physical performance and exercise-related phenotypes because the HIF1A (rs28708675) genotype didn't have a frequency distribution in the power or endurance athletes (all subjects were AA genotype). Sixteen genes and 19 polymorphisms were analyzed and listed in Table 1. Genotyping Saliva samples were collected via the bio-sample collection kit (Applied Halo Biomat Tech, Suzhou, China) and stored in a −20°C freezer until further use. DNA was extracted from saliva by QIAGEN silica gel adsorption kit (Qiagen Inc., Valencia, CA, United States). Genotyping was conducted by multiplex SNaPshot technology using an ABI fluorescence-based assay allelic discrimination method (Applied Biosystems, Foster City, CA, United States) as described previously (Ahmetov et al., 2016). Briefly, multiplexed PCR and multiplexed single-base extension reactions were first conducted, followed by capillary electrophoresis. PCR multiplexes were conducted in a 10-µl reaction, including 5 µl of SNaPshot Multiplex Kit solution (Applied Biosystems), 2 µl of purified PCR product, 1 µl of the extension primer, and 2 µl of high-purity water. The products of the SNaPshot were processed with ABI3730XL (Applied Biosystems) and data from 20 gene loci variants were analyzed with Gene Mapper Analysis Software, version 4.1 (Applied Biosystems). Genotypes were assessed independently by two investigators blinded to the study. As a quality control measure, 10% of randomly selected DNA samples were analyzed at least twice, and the results were 100% concordant.

Data Analysis
Chi-Squared (χ 2 ) tests were used to test for Hardy-Weinberg equilibrium. χ 2 Tests were also employed to compare the genotype distribution and allele frequencies of SNPs between the groups in training and validation cohorts, and p values were adjusted using the Benjamini and Hochberg multiple comparison test. Odds ratios (OR) were calculated to determine the dominant genotype in SNPs screened out. Univariate logistic regression analysis was performed to calculate the significance and strength of the association between each factor including SNPs and athletic status, and multivariate logistic regression analysis was performed to screen models subsequently. Multivariate logistic regression analysis was also conducted to determine the factors independently associated with athletic status. The predictive model only contained SNPs and the identifying model contained SNPs and other phenotypes, which both crossly validated. The results of multivariate analysis were reported as odds ratios and 95% confidence intervals, and p < 0.05 was considered to indicate statistical significance in the multivariate analysis. After these analyses, nomograms were built according to the results for predictive and identifying polygenic models of power performance. Data were analyzed using SPSS 26.0 for Windows and R 4.1.0 version. p < 0.05 was considered statistically significant.

Determination of Single-Nucleotide Polymorphisms Related to Power Performance
All the gene variants were in Hardy-Weinberg equilibrium (p > 0.05). There were significant differences in genotype distribution or allele frequency of four SNPs: ACTN3 (rs1815739), ADRB3 (rs4994), CNTFR (rs2070802), and PPARGC1A (rs8192678) (p < 0.05); however, there was no significant difference in all SNPs with the correction of multiple comparison test based on the method of Benjamini and Hochberg (Benjamini and Hochberg, 1995) (adj. p > 0.05) ( Table 1). To exclude false positives, the screened SNPs would be repeatedly verified by using additional data.

Polygenic Models Predict and Identify Power Performance
All variables including candidate SNPs were used for univariate logistic regression in all participants ( Table 2). SNPs and phenotypes including gender, age, BMI, SLJ, and SVJ were used for multivariate logistic regression analysis with five models ( Table 3). According to the fitted and adjusted effect of the models and the different purposes of talent identification in athletes, model 1 (only three SNPs), model 4 (BMI, SLJ, and three SNPs), and model 5 (BMI, SVJ, and three SNPs) were determined ( Table 3). The predictive model contained only three SNPs of ACTN3 (rs1815739), ADRB3 (rs4994), and PPARGC1A (rs8192678), while the identifying model had both three SNPs and phenotypic index.
After multiple regression analysis by using SNPs and other phenotypes, two identifying models were obtained ( Figure 2C and Figure 2E). The difference between the two models was that one included standing long jump (SLJ) index and the other included standing vertical jump (SVJ) index. Both models had fine goodness of fit (SLJ: Hosmer-Lemeshow test p 0.627; SVJ: Hosmer-Lemeshow test p 0.462). Internally bootstrapcorrected ROC analysis of the two models showed that the AUC value of the model with SVJ (AUC 0.854, 95% CI: 0.784-0.925, Figure 2F) was higher than that of the model with SLJ (AUC 0.819, 95% CI: 0.740-0.899, Figure 2D). Therefore, the model including SVJ was selected as the polygenic identifying model of power performance.
To verify the external reliability of the models, genotyping data, BMI, and SVJ of another 125 elite athletes were used to crossly validate the predictive and identifying models. After ROC analysis, the AUC of predictive and identifying models were 0.701 (95% CI: 0.609-0.794) and 0.766 (95% CI: 0.683-0.849) (Figure 3). The AUC values of two models were both greater than 0.7, which indicated that the two models had quite good prediction and recognition ability for external data.

Nomograms of Predictive and Identifying Polygenic Models for Power Performance
Factors in Figure 2A, including three SNPs of ACTN3 (rs1815739), ADRB3 (rs4994), and PPARGC1A (rs8192678), were used to create an estimation nomogram for predicting power performance. In the receiver operating characteristic (ROC) curve ( Figure 2B), the threshold value was 0.429, according to which the cutoff score could be determined in the Nomogram ( Figure 4A). Furthermore, a calibration curve graphically showed a good agreement on power performance between nomogram predicting and athletic status ( Figure 4B). The identifying model containing the three SNPs, BMI, and SVJ ( Figure 2E), which was also used to develop a nomogram for identifying power performance. The threshold value of ROC curve ( Figure 2F) was 0.784, according to which the cutoff score could also be determined in the Nomogram of identifying model ( Figure 4C) according to threshold value that was 0.784. The calibration curve also showed a good agreement between nomogram identifying and athletic status ( Figure 4D).

Scores of Two Nomograms
An ROC curve is a plot of the sensitivity versus 1-specificity of a diagnostic test. The different points on the curve correspond to the different cutoff value used to determine whether the results of the test are positive. An ROC curve can be considered the average value of the sensitivity for a test over all possible values of specificity or vice versa (Mandrekar, 2010).

DISCUSSION
The research on the relationship between gene variants and physical performance started from a single gene and a single SNP. With the expansion of SNP research on candidate genes related to physical performance, an increasing number of SNPs was put into the candidate gene pool. At regular intervals, researchers summarize the progress of such research and new genes (Macarthur and North, 2005;Hagberg et al., 2011;Roth et al., 2012;Perusse et al., 2013;Ahmetov et al., 2016;Sarzynski et al., 2016). However, with the deepening of research, researchers found that single SNP has an insignificant impact on certain physical performance or athletic status. For example, the contribution of ACTN3 R577X polymorphism, which has been studied the most, was estimated to be only 2.5% (Moran et al., 2007). It suggested that it was inadequate to rely on the variant of a single SNP to reflect the impact on a certain performance. The probability of becoming an elite athlete is based on having a set of alleles related to physical performance (Ahmetov et al., 2009;Ruiz et al., 2009). Therefore, it is more appropriate to use the variants of multiple genes or SNPs to reflect the impact on a certain physical performance. Starting from the research of single candidate genes, this study screened several SNPs that may be related to power performance through statistical tests. On this basis, polygenic models were established to reflect the impact on power performance by using multivariate logistic regression. It is more effective to quantify the effect of multiple genes through mathematical methods than that of a single gene alone. Many researchers also use this method to explore the effect of polygenic profiles on physical performance and athletic status (Ruiz et al., 2010;Massidda et al., 2014). However, there were some potential problems in these studies, such as the use of the total genotype score (TGS) to reflect the polygenic profiles, but the weight of each SNP in the TGS was equal, without considering the difference of contribution. Additionally, the operability of the predictive/identifying model was relatively poor, which was difficult to be popularized in practice. In our current study, multivariate logistic regression was used to fully consider the contribution weighting of each SNP and nomogram was used to visualize the model and improve the operability, which was more rigorous and feasible than the above-mentioned studies.
ACTN3 R577X gene polymorphism had been repeatedly proven to be associated with elite power performance in many ethnic populations. It is a SNP recognized by most researchers and scholars so far (Tharabenjasin et al., 2019). It was also proven in the East Asian population (Yang et al., 2016;Yang et al., 2017). Compared with ACTN3 R577X, ADRB3 (rs4994) and PPARGC1A (rs8192678) polymorphisms were less reported, especially in the East Asian population. Beta-Adrenergic receptors are a subgroup of G proteincoupled receptors involved in the regulation of energy metabolism. As a member of the beta-adrenergic receptor family, the ADRB3 (adrenoceptor beta 3) gene locates at 8p11.23 region of the human genome and modulates catecholamine-induced stimulation of adenylate cyclase via the action of G proteins (Yang and Tao, 2019). It is generally believed that ADRB3 is mainly expressed in adipocytes and functions to mediate lipolysis and thermogenesis (Collins and Surwit, 2001). The levels of ADRB3 mRNA and protein in the adipose tissue of obese patients and overweight individuals are significantly decreased (Kurylowicz et al., 2015;Cao et al., 2018). Xie et al. (2020) reported that the ADRB3 (rs4994) polymorphism was associated with obesity/overweight during childhood and adolescence in the East Asian population by an evidence-based meta-analysis. They found that children and adolescents with the C allele had an increasing risk of leading to obesity and overweight (Xie et al., 2020). There are few reports on the relationship between ADRB3 (rs4994) polymorphism and elite athletic performance. Santiago et al. (2011) reported that the ADRB3 (rs4994) polymorphism was associated with elite endurance performance in Spanish male athletes. They selected 53 elite power athletes and 100 endurance athletes in Spain as the research objects, and 100 people without sports training experience as the control group. After genotyping the rs4994 locus of ADRB3 gene, they found that the ADRB3 gene was significantly different in the distribution of genotypes among the three groups. Compared with pairs, there was only a significant difference between the endurance group and  control group, yet no significance between the power group and endurance group. It was concluded that ADRB3 (rs4994) polymorphism was associated with elite endurance performance . Based on the results of statistical analysis, the results of this study were inconsistent with our study, but from the distribution of genotypes in the power group and endurance group, the trend of the two studies was consistent. The inconsistency of the results may be related to ethnic differences (European ancestry populations and East Asian populations) and the sample size. PPARGC1A (peroxisome proliferator-activated receptor gamma co-activator-1-alpha) is an inducible transcription coactivator, which can participate in many life activities by promoting mitochondrial energy metabolism, such as adaptive thermogenesis, skeletal muscle fiber type conversion, glucose/ fatty acid metabolism, and cardiac development (Fontecha-Barriuso et al., 2020). Some studies showed that PPARGC1A gene controls the expression of several genes encoding key enzymes involved in fatty acid oxidation and mainly regulates the induction of muscle adaptation training (Ahmetov et al., 2006). Endurance training can increase PPARGC1A mRNA expression (Franks et al., 2003). In European ancestry populations, carrying A allele was beneficial to the improvement of cardiopulmonary function (Mathai et al., 2008); however, PPARGC1A (rs8192678) polymorphism was not associated with cardiopulmonary function and VO2max in German, Dutch (Stumvoll et al., 2004), and Northern Chinese populations (He et al., 2008), indicating that the relationship between PPARGC1A (rs8192678) gene polymorphism and physical performance was not reproducible in different races and populations, and there was a large ethnic difference. Eynon et al. (2010) studied 155 Israeli athletes and found that the A allele frequency of endurance athletes was low, and GG genotype was conducive to the increase in aerobic capacity. Maciejewska et al. (2012) reported that the frequency of G allele in PPARGC1A gene polymorphism of Polish and Russian athletes was higher than that of the A allele, and concluded that the G allele was the dominant allele and the GG genotype was the dominant genotype for the Polish and Russian endurance athletes. On the surface, the results of the above two studies were opposite to the results of our study on PPARGC1A (rs8192678) polymorphism. The result of this study was that PPARGC1A (rs8192678) polymorphism was associated with power performance, while the other two studies were associated with endurance performance. However, from the detailed data of PPARGC1A (rs8192678) genotype distribution and allele frequency reported in several studies, there was still some consistency. The frequency of the A allele and G allele in the endurance group was 36 and 64%, which was consistent with the results reported by Eynon et al. (2010). If the data of power and endurance groups in our study were combined, the G allele of all athletes would be higher than the A allele, which would be consistent with the results reported by Polish and Russian athletes. Because of the different emphases of research and analysis, the opposite conclusion had been obtained. According to the results of our study, it can also be understood that the G allele of PPARGC1A (rs8192678) polymorphism is associated with endurance performance, while the A allele is associated with power performance.
Most of the candidate SNPs involved in this study were selected from the results reported in English literature, and most of the people were European ancestry populations in Europe and the United States (Ahmetov and Fedotovskaya, 2015;Naureen et al., 2020). The three SNPs in our results have been reported in the Caucasus populations. If the results of this study were directly applied to the Caucasus populations, they should be feasible in theory, but the data of the Caucasus populations should be used for validation. In addition to European ancestry populations, other non-Chinese Han population should be used with caution because there are no relevant research results reported to support it. Due to the large number and complexity of Chinese nationalities and races, it should also be used with caution. The Chinese Han population can use it, while other ethnic minorities should use it with caution before validation.
The purpose of the studies on the multiple genes was applied to predict physical performance development in adolescent athletes and identify sports ability and athletic status in adult athletes, which focused on the relationship between SNP and physical performance and athletic status to create models to apply in talent identification. Only SNP content was involved in the model, so it can only predict the development trend of physical performance in adolescents, but identification power from sports ability and athletic status in adult athletes was insufficient. Our study considered that there was a difference between adolescents and adult athletes in the development potential of physical performance. Adolescent athletes are constantly improving and growing in ability, yet adult athletes, especially elite adult athletes have reached or nearly reached the peak, so the models of two groups should also be different. The model only containing SNPs can be used in adolescent and the other including SNP information and phenotype indicators related to power performance can identify the current situation of sports ability and athletic status in adult athletes. According to the above application objectives, the models in our study were divided into predicting and identifying. BMI and standing vertical jump (SVJ) were added to the identifying model, whose AUC value was improved from 0.736 to 0.854. It showed that the identification power from power performance was improved after phenotypes were added. Compared with previous similar studies (Ruiz et al., 2009;Ruiz et al., 2010;Massidda et al., 2014;Ben-Zaken et al., 2015), this study considered the different requirements of the target population for the models, which were more scientific and feasible. The target population includes adolescent and elite athletes. The characteristics of the adolescent population are not obvious before or just upon contact with a sports event, so it is hard to tell which type of sports event is suitable because, at this time, the coaches may need some suggestions about the development trend of power performance to determine the engagement in power-oriented Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 726552 events or not. For the elite athletes engaged in power-oriented events, coaches and scientific research personnel can identify the power status by using the model, so that they can better understand the physical state of athletes and make suitable training and competition plans. The results of this study can provide insights for research that cannot afford to use sports genomics methods (e.g., GWAS), which is a relatively new scientific discipline focusing on the organization and functioning of the genome of elite athletes (Ahmetov and Fedotovskaya, 2015). Authoritative databases, such as PubMed and Web of Science, can be used to search the genes related to a phenotypic trait and retrieve several SNPs that meet the requirements. If the population to be studied is roughly the same as the population retrieved, SNPs can be directly used as candidates for subsequent research. If the study population is inconsistent with the population retrieved, it can be further screened to eliminate the SNPs that have not been reported or reported no frequency distribution difference, to narrow the range and reduce the cost of research. For example, if a gene is found in the literature, five SNPs of this gene may be associated with phenotypic traits. Further, if it is found that two SNPs have a frequency difference distribution in the study population, and the other three have no frequency difference or no reports, then the two SNPs will be included. From the results of our study, the choice criteria are feasible.
Three SNPs can be tested, genotyped, and given a score on athletes for predicting and identifying power performance. By using two models, the coach can judge athletes' power performance potential in talent identification and distinguish power performance status in adult athletes to provide a reference for talent identification and sports training. Scores of the three SNPs can be put into predictive model (nomogram) for adolescent athletes. It shows that the athlete has a good development potential of power performance if the total score of the nomogram is more than the 130 cutoff. The measured value of BMI, SVJ, and scores of the three SNPs can be put into the identifying model (nomogram) for adult athletes. It shows that the athlete may have a high level of power performance if the total score of the nomogram is more than the 148 cutoff.
The application of the models will contribute to the fields of talent identification and sports training to a certain extent. Using the cutoff values of 130 and 148, the work of coaches will become intuitive and simple. Coaches can intuitively choose adolescents with a good power performance development potential to engage in power-oriented events when the total score from the nomogram of the predictive model is greater than 130 points. When the total score from the nomogram of the identifying model is greater than 148, it shows that athletes have high level athletic status of power performance, and coaches can reasonably plan sports training programs according to the results.
In conclusion, our results suggest that elite power performance can be predicted and identified by using polygenic profiles in Chinese athletes, according to which, two models have been created and used for talent identification in Chinese athletes. Out of the two models, the predictive one can be used in adolescent athletes to predict the development potential of power performance and the identifying one can be used in elite athletes to distinguish and evaluate power athletic status. They can be applied quickly and visually by using the method of nomogram.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the ethics committee of the School of Life Sciences Fudan University (Shanghai, China). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
RY and FJ: writing the manuscript, data analysis and interpretation; LW and XS: data analysis and interpretation, reviewing a draft of the manuscript; QG, HS and JH: manuscript revision; QZ and JW: research concept and study design, reviewing a draft of the manuscript; MC: statistical analyses, editing a draft of the manuscript.