Improving power of genome-wide association studies via transforming ordinal phenotypes into continuous phenotypes

Introduction Ordinal traits are important complex traits in crops, while genome-wide association study (GWAS) is a widely-used method in their gene mining. Presently, GWAS of continuous quantitative traits (C-GWAS) and single-locus association analysis method of ordinal traits are the main methods used for ordinal traits. However, the detection power of these two methods is low. Methods To address this issue, we proposed a new method, named MTOTC, in which hierarchical data of ordinal traits are transformed into continuous phenotypic data (CPData). Results Then, FASTmrMLM, one C-GWAS method, was used to conduct GWAS for CPData. The results from the simulation studies showed that, MTOTC+FASTmrMLM for ordinal traits was better than the classical methods when there were four and fewer hierarchical levels. In addition, when MTOTC was combined with FASTmrEMMA, mrMLM, ISIS EM-BLASSO, pLARmEB, and pKWmEB, relatively high power and low false positive rate in QTN detection were observed as well. Subsequently, MTOTC was applied to analyze the hierarchical data of soybean salt-alkali tolerance. It was revealed that more significant QTNs were detected when MTOTC was combined with any of the above six C-GWAs. Discussion Accordingly, the new method increases the choices of the GWAS methods for ordinal traits and helps to mine the genes for ordinal traits in resource populations.


Introduction
The hierarchical data (HData), phenotypic data for ordinal traits, is commonly used to describe many important traits in crop germplasm resources.This includes count data for quantitative traits and hierarchical data for resistance traits, such as the number of main stem nodes (Chang et al., 2018), the number of branches (Shim et al., 2019), and disease resistance (Megerssa et al., 2020).Ordinal traits are important in crop breeding and have a considerable impact on crop yield and quality.Genome-wide association studies (GWAS) for ordinal traits can further promote the mining of relevant excellent genes, which plays a key role in molecular design breeding and gene cloning.Cuevas et al. (2018) divided the degree of infection of anthracnose-inoculated sorghum leaves into five levels and identified three loci for anthracnose resistance in chromosome 5 using the GWAS methods.Chang et al. (2018) detected three loci significantly associated with "the number of nodes on the main stem" in 368 soybean cultivars with 62,423 SNPs.Meanwhile, Shim et al. (2019) identified five quantitative trait nucleotides (QTNs) for soybean branch number via GWAS and linkage analysis and mined a candidate gene Glyma.06g210600.
Ordinal traits are discrete traits that are controlled by multiple genes.However, their phenotypic data is hierarchical and noncontinuous and contains relatively limited information; accordingly, GWAS for ordinal traits is more complex than that for continuous quantitative traits.The threshold model represents a reasonable method for the genetic analysis of ordinal traits, and most association mapping methods are developed under this framework (Xu et al., 2005;Osval et al., 2015).Generalized linear model is based on the threshold model and link phenotypic data with latent variables through a link function.They are widely used for genetic analysis of ordinal traits and can deal with non-normal data (Feng et al., 2013;Song et al., 2016;Wang et al., 2018).The logistic regression model is another classical way for dealing with association studies of ordinal traits (Tan et al., 2007;Hoggart et al., 2008;Wu et al., 2009;Jiang et al., 2021).When sample size is limited, the application of a set-valued (SV) system model can improve the statistical power and the accuracy of parameter estimation (Bi et al., 2015).Bayesian and maximum likelihood methods are both widely used for parameter estimation in GWAS (Xu et al., 2005;Hoggart et al., 2008;Wang et al., 2018), while several studies have also employed non-parametric methods for association analysis of ordinal traits (Sun et al., 2016;Wang et al., 2017;He and Kulminski, 2020).However, most of them were either single-locus or were only suitable for the analysis of binary traits, and they had very few applications in crop.GWAS for continuous quantitative traits and single-locus methods are currently the main methods used for association analysis of ordinal traits; however, both have low power in QTN detection.
Accordingly, in this study, we proposed a method for transforming ordinal phenotypes into continuous phenotypes (MTOTC).First, the hierarchical phenotypic data for ordinal traits (HData) was transformed into continuous phenotypic data (CPData).Subsequently, FASTmrMLM (Tamba and Zhang, 2018), one GWAS method suitable for continuous quantitative traits, was used to perform GWAS for CPData.In Monte Carlo simulation studies, we validated the feasibility of the new method through the statistical power, false-positive rate in QTN detection and the accuracies for the estimates of QTN effects and positions, and obtained the number of hierarchical levels suitable for MTOTC +FASTmrMLM.The new method was validated by re-analyzing the salt-alkali resistance traits in soybean germplasm resource population of Zhang et al. (2014) and Zhou et al. (2015).This study provides more choices for association analysis of ordinal traits and helps to identify excellent genes for important complex traits in crops.

Theory and methods
Here we proposed a method, named MTOTC, to transform the discrete hierarchical data (HData) of ordinal traits into continuous phenotypic data.Then, GWAS for continuous quantitative traits (C-GWAS) are used to analyze the transformed continuous phenotypic data.The new method was described as below.

Genetic mapping population
In Monte Carlo simulation studies, 199 Arabidopsis thaliana lines harboring 10,000 SNPs with a minimum allele frequency >0.1 (Atwell et al., 2010) were selected as the genetic mapping population.For real data analysis, the population was comprised of 286 soybean cultivars assessed for salt-alkali tolerance, the phenotypic data consisted of the main root length index in 2009 and 2010 (Zhang et al., 2014), and the marker data were 54,296 high-quality SNP markers present in Zhou et al. (2015).

Method for transforming ordinal phenotypes into continuous phenotypes
To transform ordinal phenotypes into continuous phenotypes, we proposed the MTOTC method.In detail, the Chi-square test and logistic regression were used to initially select the SNPs that were significantly related to the trait.Subsequently, these significant SNPs and ordinal phenotypes were used to construct a multilocus model, Bayesian method was used to estimate the SNP effects, and the effect estimates were used to predict the continuous phenotypic data (CPData).This is MTOTC.Then, the

FIGURE 1
Technology framework of the MTOTC method in this study.Yang et al. 10.3389/fpls.2023.1247181Frontiers in Plant Science frontiersin.org predicted CPData is analyzed by C-GWAS methods, such as FASTmrMLM (Figure 1).

The Chi-square test and logistic regression
The Chi-square test in R 4.0.5 (function "chisq.test")was used to scan the SNPs in the whole genome using a single marker method (P-value ≤0.05).To further improve the quality of the significant correlated SNPs in the initial screening for reducing interference and improving detection accuracy, logistic regression was used as a secondary SNP screening method.Logistic regression was performed using function "glm" (2 hierarchical levels) and "polr" (the number of hierarchical levels greater than 2) with a P-value ≤0.05.The aim of this step was to further eliminate SNPs that were not associated with the traits for simplifying the iterations in the following multi-locus genetic model.

Multi-locus genetic model
Based on the potentially associated markers identified in the above-described initial screening, a multi-locus model was established to transform ordinal phenotypes into continuous phenotypes.The linear model is expressed as: where y represents n Â 1 ordinal phenotype vector, with n representing sample size; W = (w 1 , w 2 , …, w c ) represents n Â c matrix of covariates (fixed effects), including a column vector of 1 and population structure, and represents c Â 1vector of fixed effects, including intercept; X i and represent respectively n Â 1genotype vector and effect of the i-th potential associated SNP; q represents the number of SNPs selected in the initial screening step; ϵ e MVN n (0, s 2 e I n ) represents n Â 1error vector.The population structure Q matrix used in the linear model was calculated using Structure software (Pritchard et al., 2000).Based on the Q matrix, the population is divided into corresponding subgroups, and the optimal subgroup number K value is determined according to the corresponding standard, yielding the final Q matrix.The optimal value of the Arabidopsis population structure was calculated as K=2, and the optimal value of the saltalkali tolerant soybean population structure in the actual study was K=3.

Parameter estimation
In the second step of the novel method, a multi-locus linear mixed model for transforming ordinal phenotypes into continuous phenotypes was established, based on the empirical Bayesian algorithm (Xu, 2010).And significant loci were screened in threshold value LOD=3.0.
In model (1), set b i to obey the following prior normal distribution: The parameters were estimated using empirical Bayes, as follows, and the Newton-Raphson method.
Among them, Then, the empirical Bayesian estimates of these SNPs effects were obtained in the multi-locus model (1) based on the selected significant SNP markers and ordinal phenotype, and estimates of these effect were used to predict the phenotype, obtaining the continuous phenotypic data (CPData) of ordinal trait.

GWAS with MTOTC method for ordinal trait
When continuous phenotypic data was obtained by the above MTOTC method, a C-GWAS method could be used to detect significant loci.In this work, FASTmrMLM, one C-GWAS method, was used.So loci significantly associated with ordinal traits were detected by FASTmrMLM using the obtained continuous phenotypic data and the potential associated markers identified in the above-described initial screening.The GWAS method is henceforth referred to as MTOTC+FASTmrMLM.Moreover, the effects of five other C-GWAS (FASTmrEMMA, mrMLM, ISIS EM-BLASSO, pLARmEB, and pKWmEB) methods are also discussed based on the MTOTC method for ordinal trait, in order to verify the feasibility of MTOTC.

Monte Carlo simulation datasets for ordinal trait
We conducted six simulation studies to evaluate the feasibility of the new method.For each study, the loci 278, 2143, 2054, 3698, 1716, 6178, and 8501, located on chromosomes 1, 2, 2, 2, 1, 4, and 5, respectively, were selected as the causal loci related to the simulated trait.There were three types of phenotypic data in the simulation experiment-original data (OData), which were continuous and generated by Monte Carlo simulation; HData, which were generated from the above OData according to specific distribution proportions (i.e., classification proportion of phenotype distribution); and CPData, which were generated from the above HData by MTOTC.Then, FASTmrMLM, one multilocus C-GWAS algorithm, was used to conduct GWAS for CPData.

Threshold value in the initial screening
To determine the most suitable threshold value for the Chisquare test and logistic regression in the initial screening, four probability thresholds (0.0001 [i.e., 1/SNP number], 0.01, 0.05, and 0.10) were set for the Chi-square test in the first simulation study, while three probability thresholds (0.0001 [i.e., 1/SNP number], 0.01, and 0.05) were set for logistic regression.The Chi-square test can eliminate a large number of SNPs that are not significantly related to a given phenotype.However, the simulation study showed that some SNPs screened in the above Chi-square test (those with a P-value >0.98 and an unusually large absolute value of effect estimate in logistic regression) were not truly related to the phenotype and interfered greatly with subsequent association analysis.Therefore, to further improve the quality of the screened significantly related SNPs and detection accuracy, logistic regression was used as a secondary screening method for SNPs in MTOTC.
In the Chi-square test, the single-locus retention rate decreased with decreasing P-values (i.e., threshold values) (Figure 2A).For instance, the single-locus retention rate at loci 278 and 2143 with Pvalues of 0.05 and 0.10 was as high as 96.62%~99.68%,which are very close.When the P-value was 0.01, the single-locus retention rate began to decrease, and when the P-value was 0.0001, the retention rate dropped to between 59.06% and 68.45%.Moreover, the total retention rate (i.e., the proportion of retained loci among the total loci after chi-square test screening) was the lowest when the P-value was 0.0001, followed by 0.01, 0.05, and 0.10 (Figure 3A).
In logistic regression after the Chi-square test, the single-locus retention rate was the highest when the P-value was 0.05 (Figure 2B).For instance, the retention rates of loci 278 and 2143 were as high as 97.56%~99.68%when the P-value was 0.01 or 0.05; when the P-value was 0.0001, the retention rate dropped to between 60.51% and 69.88%.Additionally, the total retention rate was the lowest (only 0.22%) when the P-value was 0.0001, followed by 0.01 and 0.05 (Figure 3B).
Owing to too low single-locus retention rate at the P-values of 0.01 and 0.0001, the two P-values were unsuitable as a threshold for initial screening.Although the total retention rate was high when the P-value was 0.10, this P-value retains more loci that are not associated with the trait, in which it did not contribute to simplifying the model.Therefore, the probability threshold P=0.05, which is commonly used in statistics, was selected as the probability threshold for the Chi-square test and logistic regression of the initial screening in this study.In addition, we also investigated the effect of threshold value on the single-locus retention rate and the total retention rate under different proportions distribution in binary data and the similar results were observed.

MTOTC+FASTmrMLM displayed greater power than other classical mapping methods
In Monte Carlo simulation studies, the GWAS results of hierarchical data using MTOTC+ FASTmrMLM were compared with those using two classical mapping methods (Chi-square test and logistics regression) (Table 1).The results showed that these methods had greater power at the three loci 278, 2143, and 3698, but had less power (<10%) at the other four loci.Compared with the two classical mapping methods, MTOTC+FASTmrMLM had higher power at the three loci 278, 2143, and 3698, and lower false-positive rate, when the number of hierarchical levels of HData was ≤4.The power of the classical methods was higher in a few instances, it was less than 1.5-fold that of MTOTC+FASTmrMLM, but their false-positive rates were 6.8-9.5-foldhigher than that of MTOTC+FASTmrMLM.In addition, the results showed that when the number of hierarchical levels was<5, MTOTC+FASTmrMLM was more suitable for HData analysis as compared with FASTmrMLM alone.Moreover, in Table 1, MTOTC +FASTmrMLM had a relatively higher F1 score, especially for binary data (HData with two hierarchical levels).Here the F1 The effect of threshold value on the single-locus retention rate after the initial screening.(A) is the single-locus retention rate after chi-square test screening; (B) is the single-locus retention rate after logistic regression screening.
score combines the precision and recall, it is used to effectively measure the accuracy of the statistical methods and balance power and FPR.Therefore, MTOTC is recommended for the analysis of HData under four or fewer hierarchical levels.

The effect of the number of hierarchical levels on the new method
The third simulation study investigated the effect of the number of hierarchical levels on MTOTC.Based on symmetrical distribution, the number of hierarchical levels was set to 2, 3, 4, and 5, respectively, and the number of replicates was 10,000.
Meanwhile, we compared the results of OData, HData and CPData using FASTmrMLM.
Compared with CPData from the other hierarchical levels, the distribution of CPData2 (i.e., the CPData converted from the HData of 2 hierarchical levels by MTOTC) was closer to the original data (OData).First, the frequency distribution of the CPData was closer to that of the OData when the hierarchical level was low, especially when it was equal to 2 (Figure 4).As the number of hierarchical levels increased, the peak of CPData began to shift to the right and was far from the peak of the OData, which was expected to affect the GWAS results.The frequency distribution of the OData and the The effect of threshold value on the total retention rate after the initial screening.(A) is the total retention rate in chi-square test; (B) is the total retention rate in logistic regression.corresponding CPData with different hierarchical levels in the 10th and 613th replicates, randomly selected out of the 10,000 replicates using the uniformly distributed random number generator in R, is shown in Figure 4. Second, the range of the coefficient of variation (CV) of the OData was between 29.5% and 55.5%.Among the 10,000 replicates, the number of replicates beyond the CV range of the OData (4.09%, 18.94%, 21.47%, and 25.37% of CPData2, CPData3, CPData4, and CPData5, respectively) also increased with increasing hierarchical level.Thus, the CV range of CPData2 was the closest to that of the OData.Third, among the 10,000 replicates, the skewness range between the CPData and the OData was the closest at 2 hierarchical levels.Among them, the skewness range of the OData was between −1.00 and 0.46 and the range of CPData2 was between −1.28 and 0.35.As the number of hierarchical levels increased, the skewness of the CPData gradually deviated from that of the OData; the kurtosis showed the same tendency as the skewness.MTOTC performed well for the estimates of QTN position under different numbers of hierarchical levels.The position estimates via MTOTC+FASTmrMLM (i.e., the position estimates of the CPData via FASTmrMLM) were unbiased at loci 278, 2143, and 3698 (Supplementary Table 1).Although the position estimates at loci 2054 and 8501 in CPData2, and at loci 1716 and 6178 in all the CPData were biased, the relative mean absolute deviations of their position estimates were all less than 8.96E-05.The accuracy of the estimates of QTN positions for ordinal traits was significantly improved by MTOTC when the number of hierarchical levels was less than 5, i.e., the estimates of QTN positions for the CPData were better than those for the HData when FASTmrMLM was used (Supplementary Table 1).
The effect of MTOTC on the relative power at loci 278, 2143, and 3698 was the greatest when the number of hierarchical levels is equal to 2 (Supplementary Figure 1).Here, "the effect of MTOTC on the relative power" refers to the increment of the relative power of CPData compared to the relative power of HData.The relative power of the CPData (50%~100%) was significantly higher than that of the HData (22%~88%) and was relatively closer to the power of the OData.When the number of the hierarchical levels of the CPData was less than or equal to 5, the relative power exhibited an increasing trend with increasing the number of hierarchical levels and was significantly superior to that of the HData.

The effect of the number of replicates on the new method
The fourth simulation study assessed the impact of the number of replicates on the estimates of QTN effects and positions, relative power, and false-positive rate using MTOTC+FASTmrMLM.Based on the results of CPData2 (1:1), CPData3 (1:3:1), and CPData5 (1:2:4:2:1), 10 replicates were set at equal intervals from 1,000 to 10,000.As a result, the results across various numbers of replicates at each locus and for each hierarchical levels (CPData2, CPData3, and CPData5) were insignificant (Figure 5).This indicated that the number of replicates did not affect the power, false-positive rate, and the estimates of QTN effects and positions.Therefore, 1,000 replicates were used in subsequent simulation studies.

The performance of MTOTC with different GWAS methods
The HData of ordinal trait were transformed by MTOTC, and the obtained CPData were found to be suitable for association analysis via FASTmrMLM when there were five or fewer hierarchical levels, owing to high power.Meanwhile, similar results were obtained when MTOTC was combined with others methods in the mrMLM software (Zhang et al., 2020) (Supplementary Figure 1; Supplementary Table 1).They were also suitable for GWAS for the CPData of ordinal traits, having the characteristics of high relative power, low false-positive rates, and high accuracy of position and effect estimates.Moreover, similar trends from FASTmrMLM in the simulation experiments with the number of the hierarchical levels and their distribution proportions were observed as well (Supplementary Figure 2).MTOTC + FASTmrMLM had the best performance, followed by mrMLM (Wang et al., 2016), ISIS EM-BLASSO (Tamba et al., 2017), and FASTmrEMMA (Wen et al., 2018); and finally by pLARmEB (Zhang et al., 2017) and pKWmEB (Ren et al., 2018).Therefore, MTOTC can be integrated with different methods to conduct GWAS for ordinal traits.Considering the diversity and complexity of phenotypic data in ordinal traits in practice, multiple methods might be simultaneously used in a complementary manner.Accordingly, MTOTC improves the performance in identifying significant loci for ordinal traits.

Real data analysis
To validate the new method, the salt-alkali tolerant data in 286 soybean accessions obtained in 2009 and 2010 from Zhang et al. (2014) was re-analyzed in this study.The experiments were conducted in a completely randomized Design, and the number of high-quality SNP markers in this population was 54,296 (Zhou et al., 2015).First, MTOTC was applied to obtain the CPData.Then, the index data, HData5 [hierarchical data generated from the index data by 1:1:1:1:1 (Shao, 1986)], CPData2 (continuous phenotypic data generated from HData2 by MTOTC), and CPData5 (continuous phenotypic data generated from HData5 by MTOTC) for salt-alkali tolerance in soybean were analyzed using the mrMLM, ISIS EM-BLASSO, pLARmEB, FASTmrEMMA, pKWmEB, and FASTmrMLM methods.

QTNs significantly associated with soybean salt-alkali tolerance
For the four types of phenotypic data of salt-alkali tolerance, a greater number of significant QTNs were detected in CPData than in the index data or HData.Six GWAS methods mapped 65 and 99 QTNs in CPData2 and CPData5 of salt tolerance traits, respectively, and 134 and 60 QTNs in CPData2 and CPData5 of alkali tolerance traits, respectively.pLARmEB detected a greater number of QTNs in CPData (116 for salt tolerance traits and 166 for alkali tolerance traits) compared with the other five GWAS methods, which may be related to its relatively higher false-positive rate.Additionally, the numbers of significant QTNs detected by pKWmEB, mrMLM, and FASTmrMLM in CPData (44, 25, and 14 for the salt tolerance trait and 25, 21, and 19 for the alkali-tolerance trait, respectively) were second only to the number of QTNs detected with pLARmEB.
Four QTNs (locus 9682 on chromosome 2 [Chr2-9682], Chr11-54042, Chr13-64738, and Chr13-65248) for salt tolerance were simultaneously detected in the index data and at least one CPData; however, none of them was detected in HData5.For instance, Chr13-64738 was simultaneously detected in CPData2 by five methods and in the salt tolerance index data by two methods.Chr13-65248 was detected in CPData5 by four methods and in both CPData5 and the index data by FASTmrMLM.Three QTNs (Chr7-34669, Chr13-67342, and Chr20-105040) for alkali tolerance were simultaneously detected in the index data and in at least one CPData, two of them were also detected in HData5.
The results of six GWAS methods for the CPData of salt-alkali tolerance showed that only a few significant QTNs were coincident between 2009 and 2010, which can be explained by the differences in environmental influences between the two years.For salt tolerance, no QTNs were found to overlap between 2009 and 2010 in the six methods.For alkali tolerance, only Chr1-5051 and Chr16-82333 were detected in both years.There was indeed an environmental (year) effect according to variance analysis of the phenotypic results for the two years (Zhang et al., 2014).

Candidate genes for salt-alkali tolerance
Potential candidate genes were mined from 100 kb upstream to 100 kb downstream (Liu et al., 2020) of significant QTNs that were detected in at least two types of data or by two methods (Tables 2  and 3).Functional annotation information in the SoyBase database (Error!Hyperlink reference not valid.http://www.Soybase.org/)was also used to screen candidate genes.A total of 34 potentially candidate genes for salt tolerance and 25 potentially candidate genes for alkali tolerance were mined.For salt tolerance, 19 candidate genes were detected simultaneously in the index data and CPData5.Among them, Glyma05g25331, Glyma05g25420, Glyma05g25450, and Glyma05g25460 were all detected by five GWAS methods in CPData5 in 2009.Only one gene, Glyma13g25266, was detected in both the index data and CPData2 detected by five GWAS methods in CPData2 and two methods in the index data in 2010.In addition, five candidate genes were detected only in CPData2 by five methods, and nine candidate genes were detected only in CPData5 by three or more methods.No overlapping genes were found between HData5 and the index data or the CPData (Table 2).
For alkali tolerance, 7 candidate genes for alkali stress were concurrently detected in the index data and CPData5.For instance, Glyma07g20380 was simultaneously detected by 2, 1, and 6 GWAS methods in the index data, HData5, and CPData5 in 2010, respectively (Table 3).Two candidate genes were detected in the index data and CPData2.Ten candidate genes were simultaneously detected in CPData2 and CPData5.Glyma10g02920 was detected by one GWAS method in CPData2 and five GWAS methods in CPData5 in 2009.Glyma07g20380 was detected by all six association analysis methods in CPData5 in 2010.

QTN based haplotype and phenotypic difference analysis
Based on the above 34 salt stress-related candidate genes and 25 alkali stress-related candidate genes, Haploview software was used to perform haplotype block analysis.And the phenotypic differences across haplotypes were examined using the t-test in SAS9.4.Four stable QTNs for salt tolerance and six stable QTNs for alkali resistance were screened to form haplotype blocks based on linkage disequilibrium (Supplementary Figures 3 and 4).

Discussion
In this study, we established a method for transforming ordinal phenotypes into continuous phenotypes (MTOTC) based on hierarchical data for ordinal trait phenotypes and molecular marker data in resource populations.Therefore, the process of association analysis for ordinal traits is as follows: first, MTOTC is used to transform HData into continuous phenotypic data (CPData), and then a C-GWAS method (i.e.GWAS method for continuous quantitative traits) is selected to analyze the CPData to identify the QTNs that are significantly associated with ordinal traits.
In this study, simulation experiments and soybean saline-alkali tolerance analysis indicated that the new method, MTOTC, is suitable for ordinal traits when they are less than five hierarchical levels.Moreover, the combination of MTOTC with any one of the proposed C-GWAs methods exhibited high power, low falsepositive rates, and low bias in estimating the positions and effects of the QTN.The purpose of MTOTC is to provide a different approach for undertaking GWAS for ordinal traits.The feasibility of the MTOTC method was verified in real data analysis of soybean salt-alkaline tolerance using 286 soybean accessions.Compared with HData5 (i.e., the data classified as five hierarchical levels), a greater number of significant QTNs was detected concurrently by at least two GWAS methods or in two years, and more candidate genes for salt and alkali stress were screened in the CPData for salt and alkali tolerance traits.A greater number of QTNs was detected simultaneously by multiple GWAS methods in the CPData than in the index data and HData for salt-alkaline tolerance.For the three types of data, the number of QTNs detected simultaneously was respectively 4, 1, and 1 in salt tolerance and respectively 5, 2, and 3 in alkali resistance.When the phenotype distribution of the CPData generated by the new method were closer to those from the index data of salt-alkali tolerance, the GWAS results were better, and a greater number of candidate genes could be mined.This may be beneficial for selecting the appropriate distribution proportion to obtain hierarchical data of ordinal trait, screening stable QTNs, and promoting the development of molecular breeding.We also applied symmetric distribution (1:2:4:2:1) to generate HData5 for the salt tolerance index data and used MTOTC to generate the corresponding CPData5.The phenotype distribution of CPData5 with symmetric 1:2:4:2:1 exhibited a large deviation from that of the index data, and the phenotype distribution of CPData5 with uniform 1:1:1:1:1 was closer to that of the index data.Under the six methods, there were no overlapping QTNs in CPData5 and the index data for salt tolerance, which was far inferior to the above uniform distribution observed with the distribution proportion 1:1:1:1:1, under which three coincident QTNs were detected in CPData5 and the index data.This result corresponded precisely to the results presented in simulation study 5.
MTOTC performed well in the initial SNP screening.After preliminary screening under a P ≤ 0.05 threshold, a large number of SNPs that were significantly unrelated to the trait could be eliminated.Meanwhile, the simulation experiment showed that the retention rates of related loci remained high.MTOTC serves to simplify the model and save a substantial amount of computing time for subsequent association studies.
MTOTC helps to improve association analyses of ordinal traits.Regarding coefficient of variation, skewness, kurtosis, and frequency distribution, compared with the HData, the results obtained for the CPData were closer to those of the OData.Meanwhile, the results using six GWAS methods showed that the statistical power, the false-positive rate, and the position estimates in CPData were better than those in HData.Moreover, MTOTC performed better when the frequency distribution of the CPData was close to that of the OData.
The fewer hierarchical levels, the more suitable MTOTC is.Regarding the relative power in CPData under different hierarchical levels, a trend of increasing relative power with increasing number of hierarchical levels was found for all six methods when there were four or less hierarchical levels.When there were five hierarchical levels, the power of MTOTC+FASTmrMLM was close to that of FASTmrMLM in HData, but slightly lower than the power from logistic regression; only three GWAS methods had higher relative power in CPData than in HData.In addition, MTOTC had a tendency to increase variation, especially with increasing numbers of hierarchical levels.This indicates that MTOTC is more suitable for ordinal traits with fewer hierarchical levels, especially those with two or three levels.Among the six GWAS methods, FASTmrEMMA, FASTmrMLM, and mrMLM are significantly better when combined with MTOTC.This is partly attributed to that the distribution and parameter estimation principles set in MTOTC were relatively consistent with those in these three GWAS models.
This study will contribute to further research in association analysis of ordinal traits.This is especially in improving the retention rate of small-effect loci in preliminary screening, reducing the impact on variability when transforming ordinal phenotypes into continuous phenotypes, and developing novel methods for association analyses of ordinal traits.

4
FIGURE 4The frequency distribution of the OData and the corresponding CPData for different hierarchical levels in the 10th and 613th repetition.(A-D) is the 10th repetition, (E-H) is the 613th repetition.CPData2 transformed from HData of two hierarchical levels by MTOTC; CPData3 transformed from HData of three hierarchical levels by MTOTC; CPData5 transformed from HData of five hierarchical levels by MTOTC.
FIGURE 5The impact of repetition number of simulation experiment on the association analysis results of CPData (2143Locus).(A, B) MSE and MAD of QTN effect at 2143, respectively; (C) false-positive rates; (D) relative power.
FIGURE 6 The effect of phenotype distribution kurtosis on the association detection results of MTOTC+FASTmrMLM.(A, B) MSE and MAD of QTN effect at 2143, respectively; (C) false-positive rates; (D) relative power.

TABLE 1
Comparison of different genome-wide association study methods.

TABLE 2
Salt stress-related candidate genes from six genome-wide association study methods.

TABLE 3
Alkali stress-related candidate genes from six genome-wide association study methods.