The Application of Multi-Locus GWAS for the Detection of Salt-Tolerance Loci in Rice

Improving the salt-tolerance of direct-seeding rice at the seed germination stage is a major goal of breeders. Efficiently identifying salt tolerance loci will help researchers develop effective rice breeding strategies. In this study, six multi-locus genome-wide association studies (GWAS) methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO) were applied to identify quantitative trait nucleotides (QTNs) for the salt tolerance traits of 478 rice accessions with 162,529 SNPs at the seed germination stage. Among the 371 QTNs detected by the six methods, 56 were identified by at least three methods. Among these 56 QTNs, 12, 6, 7, 4, 13, 12, and 12 were found to be associated with SSI-GI, SSI-VI, SSI-MGT, SSI-IR-24h, SSI-IR-48h, SSI-GR-5d, and SSI-GR-10d, respectively. Additionally, 66 candidate genes were identified in the vicinity of the 56 QTNs, and two of these genes (LOC_Os01g45760 and LOC_Os10g04860) are involved in auxin biosynthesis according to the enriched GO terms and KEGG pathways. This information will be useful for identifying the genes responsible for rice salt tolerance. A comparison of the six methods revealed that ISIS EM-BLASSO identified the most co-detected QTNs and performed best, with the smallest residual errors and highest computing speed, followed by FASTmrMLM, pLARmEB, mrMLM, pKWmEB, and FASTmrEMMA. Although multi-locus GWAS methods are superior to single-locus GWAS methods, their utility for identifying QTNs may be enhanced by adding a bin analysis to the models or by developing a hybrid method that merges the results from different methods.


INTRODUCTION
A genome-wide association studies (GWAS) represents a powerful option for the genetic characterization of quantitative traits, and has been widely used for analyzing agronomic traits related to plants. Numerous genetic variants for complex traits have been identified based on single-locus GWAS methods, such as empirical Bayes, efficient mixed model association (EMMA), genome-wide efficient mixed linear model association (GEMMA), settlement of mixed linear model under progressively exclusive relationship (SUPER), and mixed linear model (MLM) (Kang et al., 2008;Zhou and Stephens, 2012;Wang et al., 2014Wang et al., , 2016a. Although the statistical power of quantitative trait nucleotide (QTN) detection improves after controlling the polygenic background, most of the small effects associated with complex traits are still not captured by single-locus GWAS methods.
In a single-locus GWAS model, markers are tested individually in a one-dimensional genome scan. Moreover, the multiple test correction for the critical value of a significance test should be considered. Bonferroni correction is widely used to modify the threshold value to control the false positive rate (FPR). However, this type of correction method is so conservative that true QTNs may be eliminated. Therefore, the best way to solve this problem is to develop a multi-locus GWAS method that does not require a multiple test correction. Multi-locus GWAS methods involve a multi-dimensional genome scan, in which the effects of all markers are simultaneously estimated. Many penalized multilocus GWAS methods have been developed, including the least absolute shrinkage and selection operator (LASSO), empirical Bayes LASSO, and adaptive mixed LASSO (Yi and Xu, 2008;Cho et al., 2009Cho et al., , 2010Wu et al., 2009;Ayers and Cordell, 2010;Wang et al., 2010;Giglio and Brown, 2018). These methods can minimize some marker effects to zero when the number of single nucleotide polymorphisms (SNPs) is not much larger than the sample size. However, the rapid development of sequencing technologies has enabled the detection of many SNPs (i.e., the number of SNPs is hundreds of times larger than the sample size). Thus, the available methods for minimizing marker effects are ineffective. One option for addressing this issue involves decreasing the number of SNPs. Dr. Zhang' lab developed an R package called mrMLM, which includes the following six multilocus GWAS methods: mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO. All of these methods involve two-step algorithms. During the first step, a single-locus GWAS method is applied to scan the entire genome, and putative QTNs are detected according to a less stringent critical value, such as P < 0.005 or P < 1/m, where m is the number of markers. During the second step, all selected putative QTNs are examined by a multi-locus GWAS model to detect true QTNs (Wang et al., 2016a,b;Tamba et al., 2017;Zhang et al., 2017;Ren et al., 2018;Wen et al., 2018a,b;Zhang and Tamba, 2018). The mrMLM package solves the problem associated with co-factor selection in the multi-locus GWAS model when there are many markers.
Rice (Oryza sativa L.), which is one of the most important cereal crops worldwide, is sensitive to salt stress. With the increasing salinization of soils, salt stress is becoming a key abiotic factor limiting rice production that rice breeders must overcome (Hu et al., 2012). Developing salt-tolerant rice cultivars is an efficient way to minimize crop loss. Over the past several years, high density SNPs have been used to detect variants with GWAS methods to improve rice varieties (Han and Huang, 2013;Chen et al., 2014;Yang et al., 2014;Wei et al., 2017). However, most traits related to abiotic stress tolerance are controlled by several polygenes that are undetectable in single-locus GWAS models (Lee et al., 2003;Cui et al., 2015). Therefore, we should apply multi-locus GWAS methods to identify loci related to salt tolerance. In this study, 478 rice accessions, each with seven salt stress susceptibility index (SSI)-related traits, and 162,529 SNPs were used to conduct a multi-locus GWAS. Our objectives were to identify the significant QTNs related to salt tolerance and provide recommendations regarding the selection of a multilocus GWAS method by comparing the differences among the six multi-locus methods included in the mrMLM package.

Rice Phenotypic Data Related to Salt Tolerance
We analyzed 478 rice accessions from 46 countries and regions regarding seven salt tolerance-related traits at the seed germination stage in a multi-locus GWAS. Phenotypic data were collected for control and stress-treated plants incubated in a growth chamber, with two independent experiments conducted for the control and stress treatments. Each independent experiment involved a randomized block design with two replicates. The dataset was published by Shi et al. (2017), and the seven salt tolerance-related traits were VI, GI, germination rate (GR) at days 5 and 10, MGT, and imbibition rate (IR) at 24 and 48 h. All salt tolerance-related traits were measured for plants treated with 60 mM NaCl or water (control) as follows: IR (mg/g) was calculated as IR = (W 2 − W 1 )/W 1 × 1000 at 24 and 48 h after starting the incubation, where W 1 represents the dry seed weight and W 2 represents the imbibed seed weight; GR was calculated as GR = N t /N 0 × 100% at days 5 and 10, where N t is the number of germinated seeds at day t and N 0 is the total number of seeds; GI was calculated as GI = (G t /T t ), where G t is the accumulated number of germinated seeds at day t and T t is the time (in days); MGT was calculated as MGT = T i N i / N i , where N i is the number of newly germinated seeds at day t and T i is the time (in days); VI was calculated as VI = GI × SL, where SL is the average shoot length of 10 germinated seeds at day 10. The salt tolerance level of rice during the germination stage was estimated with the following equation: SSI = (1 − Y s /Y p )/D, where Y s is the performance of an individual under the stress condition, Y p is the performance of an individual under the normal condition, and D is the stress intensity, which was calculated as D = 1 − ( Y s / Y p ). Finally, 21 traits were included in this study. The abbreviated names of these 21 traits are provided in the abbreviations list.

Genotyping and Multi-Locus GWAS
The 478 rice accessions analyzed in this study were from the 3K rice genome project. The 3K rice genome project 404K coreSNP dataset from the Rice-Seek Database was downloaded from http://snp-seek.irri.org/_download.zul (Alexandrov et al., 2015). We used the PLINK program (version 1.9) (Purcell et al., 2007) to obtain a subset of 162,529 SNPs with a minor allele frequency > 5% and a missing data ratio < 0.1 for association analyses. The kinship matrix (K matrix) was calculated based on the genotype marker information described by Xu (2013). The mrMLM package, including six multi-locus GWAS methods, was downloaded from http://cran.r-project.org/web/packages/ mrMLM/index.html. Default values were used for all parameters.

Annotation of Candidate Genes and Pathway Enrichment Analysis
Synonymous and non-synonymous SNPs and SNPs associated with large-effect changes were annotated using the snpEff program (version 4.0) (Cingolani et al., 2012) based on the gene models of the annotated Nipponbare reference genome (IRGSP 1.0) (Kawahara et al., 2013). All putative SNPs located within genes and annotation details have been published (Kawahara et al., 2013). Enriched gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were identified using the agriGO (version 2.0) (Tian et al., 2017) and EXPath 2.0 (Chien et al., 2015) programs, respectively.

Heritability and Variance
The heritability and residual errors estimated by the six multilocus GWAS methods are presented in Table 1. The narrow sense heritability ranged from 0.17 for S_MGT and 0.57 for S_IR_48h. A comparison of the residual errors among the six multi-locus GWAS models revealed that the residual error estimated by FASTmrEMMA was the largest under the normal condition when the phenotypic variation was larger than 10. Under the salt stress condition, the largest residual errors for traits S_IR_24h and S_IR_48h were observed from FASTmrEMMA. Regarding the SSI-related traits, the largest residual error was estimated by FASTmrEMMA. The salt tolerance level was evaluated according to the SSI-related traits. Lower SSI values indicated a higher tolerance to salt stress. The results of the correlation analyses of the seven SSI-related traits are presented in Figure 1A. There were significant positive correlations among SSI_VI, SSI_GR_5d, SSI _GR_10d, and SSI_GI. The correlation coefficients between SSI-VI and the other three SSI-related traits, namely SSI_GR_5d, SSI_GR_10d, and SSI_GI, were 0.91, 0.91, and 0.96, respectively. Meanwhile, the correlation coefficients for SSI_GR_5d, SSI_GR_10d, and SSI_GI were 0.89, 0.95, and 0.96, respectively. The high correlation among the four SSI-related traits implied that some novel loci might be simultaneously detected for different traits.

Validation of the Common QTNs
Among the 56 QTNs, 14 were identified by at least five methods, of which four, three, two, four, and one were associated with SSI_GI, SSI_GR_5d, SSI_GR_10d, SSI_IR_48h,  and SSI_MGT, respectively. We divided the population into two groups according to allelic genotypes to test whether the mean phenotypes of the two groups were significantly different. The mean value of the group carrying the favorable allele was less than that of the other group (Figure 2).

GO and KEGG Pathway Enrichment Analyses
According to the Nipponbare reference genome, the 371 identified QTNs for traits related to salt tolerance were part of or were adjacent to 581 genes (Supplementary Table S1). These genes were significantly enriched for GO biological processes related to the plant lipid metabolic process and transmembrane transport process (Supplementary Table S3). They were also significantly enriched for the plant tryptophan metabolism pathway (P < 0.03). Moreover, two genes (LOC_Os01g45760 and LOC_Os10g04860) were associated with auxin biosynthesis. A total of 66 genes were identified around the 56 QTNs based on the enriched GO terms and KEGG pathways as well as the functional annotations (Supplementary Table S4). This information may be very useful for identifying the genes responsible for salt tolerance in rice.

DISCUSSION
Multi-locus GWAS models, which are relatively close to the true genetic models of plants and animals, are superior to singlelocus GWAS models because of their higher statistical power and lower FPR (Segura et al., 2012;Wang et al., 2016a). These models were developed by geneticists, who added the polygenic effect and population structure to the single-locus GWAS model to decrease the bias in effect estimations by controlling the genetic background (Zhang et al., 2005;Yu et al., 2006;Zhang et al., 2010). Although advancements in the single-locus GWAS models have improved the detection accuracy to some extent, the multiple test correction for the threshold value of the significance test in single-locus models (e.g., Bonferroni correction) is too stringent to capture all true QTNs. Another unavoidable problem is that single-locus GWAS methods are inappropriate when the target traits are controlled by a series of polygenes. In this study, 478 rice accessions with 162,529 SNPs were used to identify QTNs for traits related to salt tolerance based on six multi-locus GWAS methods. We compared the QTNs identified by the multilocus GWAS methods in our study with the previously reported QTNs detected by the efficient mixed-model EMMA eXpedited (EMMAX) program comprising a single-locus GWAS method. The comparison revealed that four of the previously reported six QTNs related to SSI-VI were detected by a multi-locus GWAS, and two QTNs associated with SSI-MGT overlapped with the previously reported QTNs. Additionally, 12, 4, 13, 12, and 12 QTNs separately associated with SSI-GI, SSI-IR-24h, SSI-IR-48h, SSI-GR-5d, and SSI-GR-10d, respectively, were simultaneously detected by at least three multi-locus GWAS methods. In contrast, none of the QTNs associated with the five traits were identified by a single-locus GWAS method. These observations were as expected, and can be explained by the following two points: (i) salt tolerance is a quantitative genetic characteristic that is controlled by multiple genes with small effects, which are difficult to detect in a single-locus GWAS model (Wang et al., 2011;Kumar et al., 2015); (ii) some true QTNs for traits related to salt tolerance are missed by a single-locus GWAS model because of an overly conservative critical value. Furthermore, our results suggest that a multi-locus GWAS model may be useful for detecting loci with small effects.
In this study, we used six multi-locus GWAS methods included in the mrMLM package to detect QTNs. The six methods involve two-step algorithms, and marker effects are treated as random effects in each method. However, each method has its own characteristics. We observed that mrMLM detected the most QTNs (Supplementary Table S1), but this method has one shortcoming. When the number of putative QTNs is much larger than the sample size, the multi-locus model in this method will be over-fitted. The residual error estimated by mrMLM was FIGURE 2 | Boxplot for validating 14 co-detected QTNs (A-N). For each QTN, the population was divided into two groups according to allele types. The X-axis represents the two alleles for each QTN, while the Y-axis corresponds to the phenotype. much smaller than that estimated by the five other methods ( Table 1). During the first step, 7,588 QTNs with a threshold value P < 0.01 were selected, which is 16 times larger than the sample size. Over-fitting may occur when too many variables are added to a multi-locus model. This issue was solved by using FASTmrMLM, in which the least angle regression (LARS) algorithm is implemented between the first single-locus scanning step and the EM-Empirical Bayes estimation in the second step. The LARS algorithm (Efron et al., 2004) is a flexible method for selecting variables, and can be applied in the lars package 1 . In this method, n−1 variables (n is the number of samples), which are most likely associated with the target traits, are added to the multi-locus model.
The FASTmrEMMA method detected the fewest QTNs. This method involves an approximation algorithm in which the covariance matrix of the polygenic matrix K and environmental noise are whitened by a matrix transformation to increase the computing speed. In the pLARmEB method, the same transformed model as that used in FASTmrEMMA is implemented to control the polygenic background, and the LARS algorithm is applied to select potential SNPs related to the target trait for the subsequent multi-locus GWAS detection. Among the six multi-locus GWAS methods, ISIS-EM-BLASSO had the shortest running time and the smallest estimated residual errors (Supplementary Figure S2 and Table 1). In the first step of 1 http://cran.r-project.org/web/packages/lars/ this method, an iterative-modified sure independence screening (ISIS) approach is used to decrease the number of SNPs to a moderate level, after which the Expectation-Maximization (EM)-Bayesian least absolute shrinkage and selection operator (BLASSO) is used to estimate all of the selected SNP effects to detect true QTNs. The last method, pKWmEB, is a nonparametric method, in which a Kruskal-Wallis test and the LARS algorithm are used to identify potential SNPs. All identified markers are added to the multi-locus model to detect true QTNs.
The two-step multi-locus GWAS methods included in this study significantly improved the statistical power and decreased the FPR. Moreover, ISIS EM-BLASSO identified the most codetected QTNs, followed by pKWmEB, while FASTmrEMMA identified the fewest QTNs (Table 2). Additionally, ISIS EM-BLASSO performed best, with the smallest estimated residual errors and highest computing speed. However, selecting an appropriate critical value is still problematic for the two-step multi-locus GWAS model. A threshold value that is too stringent will lead to the omission of loci information, whereas a relaxed threshold value will result in numerous loci being selected, which may lead to the over-fitting of multi-locus models. A simple solution to this problem involves developing a hybrid method that combines the results from different methods. Directly decreasing the number of SNPs instead of applying a single-locus GWAS scanning step represents another potential solution. We are currently developing a new bin analysis method that can be applied to any type of population. In the bin analysis method, the number of markers is decreased, but the information for all markers is fully retained. Adding a bin analysis to the multi-locus GWAS model represents a new option.

CONCLUSION
In this study, six multi-locus GWAS methods were used to detect loci related to rice salt tolerance at the seed germination stage. A total of 371 QTNs were identified, with 56 QTNs co-detected by at least three methods. Moreover, 66 genes were identified in the vicinity of the 56 QTNs based on functional annotations. Two of these genes (LOC_Os01g45760 and LOC_Os10g04860) are involved in auxin biosynthesis according to the enriched GO terms and KEGG pathways. These observations may be useful for identifying the genes responsible for rice salt tolerance.

AUTHOR CONTRIBUTIONS
YC drafted the manuscript. FZ and YC analyzed the data. YZ and FZ conceived the study and were in charge of the direction and planning. All authors read and approved the final version of this manuscript.
FUNDING FIGURE S1 | Bar plot of the number of QTNs associated with seven salt tolerance traits detected by different methods.