Detecting the QTL-Allele System of Seed Oil Traits Using Multi-Locus Genome-Wide Association Analysis for Population Characterization and Optimal Cross Prediction in Soybean

Zhang, Yinghu; He, Jianbo; Wang, Hongwei; Meng, Shan; Xing, Guangnan; Li, Yan; Yang, Shouping; Zhao, Jinming; Zhao, Tuanjie; Gai, Junyi

doi:10.3389/fpls.2018.01793

ORIGINAL RESEARCH article

Front. Plant Sci., 05 December 2018

Sec. Plant Breeding

Volume 9 - 2018 | https://doi.org/10.3389/fpls.2018.01793

Detecting the QTL-Allele System of Seed Oil Traits Using Multi-Locus Genome-Wide Association Analysis for Population Characterization and Optimal Cross Prediction in Soybean

Yinghu Zhang^1,2^†

Jianbo He^1,3,4,5^†

Hongwei Wang¹

Shan Meng¹

Guangnan Xing^1,3,4,5

Yan Li^1,3,4,5

Shouping Yang^1,3,4,5

Jinming Zhao^1,3,4,5,6

Tuanjie Zhao^1,3,4,5,6

Junyi Gai^1,3,4,5,6^*

¹Soybean Research Institute, Nanjing Agricultural University, Nanjing, China
²Jiangsu Coastal Institute of Agricultural Sciences, Yancheng, China
³National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, China
⁴Key Laboratory of Biology and Genetic Improvement of Soybean (General), Ministry of Agriculture, Nanjing, China
⁵State Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
⁶Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, China

Soybean is one of the world's major vegetative oil sources, while oleic acid and linolenic acid content are the major quality traits of soybean oil. The restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS), characterized with error and false-positive control, has provided a potential approach for a relatively thorough detection of whole-genome QTL-alleles. The Chinese soybean landrace population (CSLRP) composed of 366 accessions was tested under four environments to identify the QTL-allele constitution of seed oil, oleic acid and linolenic acid content (SOC, OAC, and LAC). Using RTM-GWAS with 29,119 SNPLDBs (SNP linkage disequilibrium blocks) as genomic markers, 50, 98, and 50 QTLs with 136, 283, and 154 alleles (2–9 per locus) were detected, with their contribution 82.52, 90.31, and 83.86% to phenotypic variance, corresponding to their heritability 91.29, 90.97, and 90.24% for SOC, OAC, and LAC, respectively. The RTM-GWAS was shown to be more powerful and efficient than previous single-locus model GWAS procedures. For each trait, the detected QTL-alleles were organized into a QTL-allele matrix as the population genetic constitution. From which the genetic differentiation among 6 eco-populations was characterized as significant allele frequency differentiation on 28, 56, and 30 loci for the three traits, respectively. The QTL-allele matrices were also used for genomic selection for optimal crosses, which predicted transgressive potential up to 24.76, 40.30, and 2.37% for the respective traits, respectively. From the detected major QTLs, 38, 27, and 25 candidate genes were annotated for the respective traits, and two common QTL covering eight genes were identified for further study.

Introduction

Soybean [Glycine max (L.) Merr.] is the world's leading oilseed crop, accounting for 53.9 (29%) million metric tons of world vegetable oil consumption (http://soystats.com/international-world-vegetable-oil-consumption/,2016). The seed oil content in soybean is ~20% on average, and the oil quality is determined by the proportions of five major fatty acids, which include palmitic (C16:0), stearic (C18:0), oleic (C18:1), linoleic (C18:2), and linolenic (C18:3) acid (Wilson, 2004). The unsaturated fatty acids, such as oleic, linoleic and linolenic acids, have positive effects on human health (Bahrami, 2010), while the polyunsaturated fatty acids are not desirable for human consumption. Therefore, great efforts have been made in soybean breeding to increase the seed oil content (SOC) and oleic acid content (OAC), and to decrease the linolenic acid content (LAC) (Panthee et al., 2006).

Breeding progress depends on the potential gene resources in germplasm, such as landraces which were developed historically by farmers. The key for utilization of required genes in the germplasm population is to explore the genetic loci and alleles underlying breeding traits and to identify superior alleles. Soybean originated in China, where the crop has been cultivated for more than 5000 years (Hymowitz, 2004). During the long history, ancient Chinese farmers have developed a great number of landraces which accumulate tremendous genetic variation, and therefore, the Chinese soybean landraces are the most important gene/germplasm reservoirs for breeding programmes (Gai et al., 2012).

Soybean seed oil traits, i.e., the oil and fatty acid content, are complex traits involving a large number of genes. At present, a number of QTLs for seed oil traits have been mapped on 20 chromosomes in soybean based on linkage mapping (Supplementary Table 1). However, linkage mapping is usually applied to a segregating population derived from two parents, and therefore it is limited in terms of allelic diversity and mapping resolution (Zhu et al., 2008). The genome-wide association study (GWAS) is found to be a powerful approach to detect QTL and their multiple alleles at a relatively higher resolution, and it can also be directly applied to natural populations such as germplasm. Although a number of GWASs have been performed for soybean seed oil content (Hwang et al., 2014; Sonah et al., 2015; Wen et al., 2015; Zhou et al., 2015; Cao et al., 2017; Li et al., 2018), only few are for fatty acid composition (Li et al., 2015; Fang et al., 2017; Leamy et al., 2017).

The previous GWAS procedures concentrate on finding a handful of major loci, such as general linear model and mixed linear model (MLM) approaches (Pritchard et al., 2000b; Price et al., 2006; Yu et al., 2006) based on single-locus model, and even MLMM (Segura et al., 2013) and mrMLM (Wang et al., 2016) based on multi-locus model. But plant breeders are more likely interested in exploring the whole QTL-allele system for both forward selection and background control in breeding programs. Furthermore, the previous GWASs are generally based on SNP markers which involve only two alleles at one site, therefore the multi-allelic variation which widely exists in germplasm population cannot be detected. To overcome these limitations, He et al. (2017) proposed an innovative restricted two-stage multi-locus GWAS procedure (RTM-GWAS) for a relatively thorough detection of QTL and their multiple alleles in a germplasm population. In the RTM-GWAS procedure, the tightly linked SNPs are grouped into SNP linkage disequilibrium blocks (SNPLDBs) to form genomic markers with multiple haplotypes as alleles, and then it utilizes two-stage association analysis based on a multi-locus multi-allele model for genome-wide QTL identification along with their multiple alleles. Simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous GWAS procedures, especially under large sample size and high trait heritability conditions. The RTM-GWAS procedure has been applied to identify QTL-allele system of 100-seed weight (Zhang et al., 2015) and seed isoflavone content (Meng et al., 2016) in CSLRP. More recently, Li et al. (2017) applied the RTM-GWAS procedure to a soybean nested association mapping population, and identified 139 flowering date QTLs with 496 alleles, which cover almost all QTLs detected by four other mapping procedures.

Optimal cross design and precise progeny selection are two major steps in conventional plant breeding with the former determining the potential of the latter. Peleman and van der Voort (2003) presented “Breeding by Design” concept based on QTL mapping, aiming to choose parents and design crosses for potential recombination. Meuwissen et al. (2001) proposed genomic selection (GS) as a marker-assisted selection procedure based on genome-wide SNP/markers. GS composes two links, establishing an index between required targets and SNPs/markers from a training population and then using the index in progeny selection based on its genome-wide SNP/marker information. Jonas and de Koning (2013) indicated that GS approaches from dairy cattle breeding cannot be readily applied to complex plant breeding. Therefore, following the “Breeding by Design” concept, GS based on whole-genome QTL-allele system detected from RTM-GWAS seems to be a potential approach for both optimal cross design and precise progeny selection (He et al., 2017).

In the present study, the CSLRP was used to explore QTL-allele constitutions of three major seed oil traits, i.e., SOC, OAC, and LAC, in the most important soybean gene/germplasm reservoir. Accordingly, the QTL-allele matrices were established as a compact form of the population genetic structure of seed oil traits. The matrices were used to characterize the genetic differentiation among ecoregion subpopulations and to select optimal crosses for seed oil improvement in soybean breeding. Accordingly, the candidate gene system was annotated from the detected QTLs for further study on the oil trait genes.

Materials and Methods

Plant Materials and Field Experiments

A sample composed of 366 soybean landraces as representative of CSLRP was used for the present study. The sampled accessions have their origination distributed in the six soybean cultivation ecoregions in China (Gai and Wang, 2001). They are I: Northern Single Cropping, Spring Planting Ecoregion; II: Huang-Huai-Hai Double Cropping, Spring, and Summer Planting Ecoregion; III: Middle and Lower Changjiang Valley Double Cropping, Spring, and Summer Planting Ecoregion; IV: South Central Multiple Cropping, Spring, Summer, and Autumn Planting Ecoregion; V: Southwest Plateau Double Cropping, Spring, and Summer Planting Ecoregion; VI: South China Tropical Multiple, All Season Planting Ecoregion (Table 1). This population has been used in the establishment of QTL-allele matrix for the 100-seed weight (Zhang et al., 2015) and seed isoflavone content (Meng et al., 2016).

TABLE 1

Table 1. Frequency distribution and descriptive statistics of seed oil content, oleic acid, and linolenic acid content in CSLRP.

The materials were tested in randomized complete block design (RCBD) experiments, 0.7 m × 0.8 m hill plots with two replications at Jiangpu Experimental Station (abbreviated as JP) of Nanjing Agricultural University, Nanjing, China in 2008, 2009, and 2010, and two replications at Lishui Experimental Station (abbreviated as LS), Nanjing, China in 2009. The hill plots were thinned to six seedlings per plot (Wen et al., 2009). The planting dates were 20 June 2008JP, 19 June 2009JP, 26 June 2009LS, and 23 June 2010JP, where the codes of 2008JP, 2009JP, 2010JP, and 2009LS represent the environments composed of the year (2008–2010) and location (JP and LS), respectively.

A specimen of 20 g seeds for each replication, each accession of the CSLRP were milled with a 1095 Knifetec sample mill (FOSS Tecator, Denmark), then NIR spectroscopy analysis was performed using VECTOR22/N (BRUKER, German), and finally the SOC, OAC, LAC were converted using the calibration model developed by Wang (2011).

Statistical Analysis

A joint analysis of variance (ANOVA) was conducted for the CSLRP using PROC GLM of SAS 9.4 (SAS Institute Inc., Cary, NC, USA), in which the genotype, environment, replication, and genotype-by-environment interaction were considered to be random effects. The heritability (h²) was estimated as ${\hat{h}}^{2} = {\hat{σ}}_{g}^{2} / ({\hat{σ}}_{g}^{2} + {\hat{σ}}^{2} / r)$ for individual environments and ${\hat{h}}^{2} = {\hat{σ}}_{g}^{2} / [{\hat{σ}}_{g}^{2} + {\hat{σ}}_{g e}^{2} / s + σ^{2} / (s r)]$ for multi-environment joint analysis, where ${\hat{σ}}_{g}^{2}$ , ${\hat{σ}}_{g e}^{2}$ and ${\hat{σ}}^{2}$ are estimated variances of genotype, genotype-by-environment interaction, and the random error, respectively, and s is the number of environments and r is the number of replications in an experiment (Hanson et al., 1956). The variance components were estimated using REML method with PROC VARCOMP of SAS 9.4. The genetic coefficient of variation was calculated as ${\hat{σ}}_{g} / μ$ , where μ is the population mean.

Genotyping

The RAD-Seq (restriction site-associated DNA sequencing) was used for SNP genotyping in the present study. All the genotyping work was done at BGI Tech, Shenzhen, China. A total of 116,769 SNPs were identified after quality control and grouped into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) according to He et al. (2017), Zhang et al. (2015), and Meng et al. (2016). The SNPLDB is a segment with its SNPs linked together. The sequence of each SNPLDB/segment differentiated among the 366 landraces and formed haplotypes in the same region which were considered to be alleles on a same locus/SNPLDB.

Association Mapping

Association mapping was conducted with the innovative restricted two-stage multi-locus GWAS (RTM-GWAS) procedure (He et al., 2017). At the first stage, single-locus association test based on the simple linear model was used to eliminate redundant markers, and at the second stage, the stepwise regression was applied to build the final multi-locus model based on candidate markers pre-selected from the first stage. The top 10 eigenvectors (accounting for 86% of the total variation) of the genetic similarity coefficient matrix built on SNPLDBs were incorporated as covariates to correct for population structure.

Since the environment factor in the present study involved 3 years and two locations in a same city which did not relate to certain fixed factors, therefore, the whole set of the data rather than individual environment data were used for association mapping. The mean data set across all environments were used for association analysis with a normal significance level of 0.02 as the built-in control for experiment-wise error rate of multi-locus model. As more stringent significance levels, such as 0.0002, were also suggested in other multi-locus methods such as mrMLM. Therefore, to identify candidate genes corresponding to major QTLs, a significance level of 0.0002 was also used in RTM-GWAS.

To compare the results with the previous GWAS methods, the MLM GWAS were also performed. The population structure matrix (Q) estimated from STRUCTURE 2.2 (Pritchard et al., 2000a) and the familial relatedness matrix (K) were used jointly, and the association analysis was performed using TASSEL software (Bradbury et al., 2007).

Genetic Differentiation Analysis

The analysis of molecular variance (AMOVA) for molecular variance among ecoregions was carried out based on the whole genome SNPLDBs and the SNPLDBs associated with the seed oil traits, respectively. The Arlequin 3.5.2.2 software were used for the computations (Excoffier and Lischer, 2010). To examine the difference among the ecoregion QTL-allele matrices, the chi-squared test was used to test the independence of the allele frequency distribution among ecoregions for each locus using the PROC FREQ in SAS.

Optimal Cross Prediction

All possible single crosses among the 366 accessions (66,795) were generated in silico under linkage model and independent assortment model for recombination potential of the seed oil traits (He et al., 2017). For each cross, the predicted genotypic value of seed oil traits was calculated based on 2,000 continuously inbred progenies derived from F₁ individuals and the QTL-allele matrix. The 99th percentile of a cross was used as its predicted cross value for SOC and OAC, and the 1st percentile of a cross was used as its predicted cross value for LAC.

Candidate Gene Annotation

From the detected QTL system in CSLRP, the candidate genes were annotated using the following steps: firstly, the genes located <100 kb away from the associated SNPLDBs were identified based on SoyBase (http://soybase.org); secondly, those genes containing SNPs in the population were identified; and finally, those genes containing SNPs significantly associated with the detected SNPLDB through chi-square test were considered as candidate genes. The annotations and expression data of candidate genes were retrieved from SoyBase (http://soybase.org), and the genes were grouped into three categories, i.e., biological process, cellular component, and molecular function. The pathway analysis of candidate genes was performed based on SoyCyc Soybean Metabolic Pathway Database (https://soycyc.soybase.org).

Results

Phenotypic and Genotypic Variation of Seed Oil Traits in the CSLRP

There is a wide variation of the seed oil traits in the CSRLP, ranging from 14.95 to 26.42%, 10.26 to 40.44%, and 1.95 to 11.61% for SOC, OAC, and LAC, respectively (Table 1). The heritability of SOC, OAC, and LAC were estimated as 91.29, 90.97, and 90.24% with the genetic coefficient of variation ranging from 5.65 to 16.68. The results from ANOVA showed significant genotype-by-environment interactions, while the genotypic variation among the landraces was 10.24 (LAC) to 11.78 (SOC) times of that of genotype-by-environment interaction in CSLRP (Supplementary Table 2).

The whole sample was separated into six sub-samples according to ecoregions in China (Table 1). The phenotype differences of the seed oil traits among ecoregion means were not large, but large phenotype variation within ecoregion existed, indicating abundant variation in each ecoregion. Among ecoregions, the varieties from ecoregion I (Northeast China) have relatively more SOC and OAC, but less LAC. That might be due to the enhancement of soybean improvement on seed oil traits in this region during the recent decades. Since large variation existed in each ecoregion and the soybean landraces were developed independently by local farmers, exploring QTL-allele constitutions of each ecoregion may provide information on genetic improvement potential of the three oil traits.

The QTL Systems of Seed Oil Traits in the CSRLP

The comparison for the number of significantly associated markers obtained from RTM-GWAS and MLM was summarized in Table 2. If correction for multiple testing is not considered (using a normal significance level of 0.05), the MLM method showed tremendous overflowing (R² > 100%) of the total phenotypic contribution. But in contrast, a large amount of missing heritability in comparison with the trait heritability (especially for SOC) was observed for the MLM method under the Bonferroni-adjusted significance level (Supplementary Figure 1). However, with the RTM-GWAS method which was based on multi-locus association analysis, a total of 50, 98, and 50 SNPLDBs were detected to be significantly (under a significance level of 0.02) associated with SOC, OAC, and LAC, respectively (Figure 1). The total contribution to phenotype variance of the associated loci were 82.53, 90.29, and 83.84%, which were close to but did not exceed the trait heritability values, 91.29, 90.97, and 90.24%, respectively (Table 2). A more stringent significance level of 0.0002 was also used in RTM-GWAS, and 13, 19, and 12 in a total of 42 SNPLDBs (two loci overlap) were identified for SOC, OAC, and LAC, respectively. As expected, the SNPLDBs detected using stringent significance level were included in the SNPLDBs under normal significance level with the same p-value order, except that two SNPLDBs for SOC were newly detected using 0.0002 (Tables 3, 4, Supplementary Table 3). The loci excluded due to stringent significance level were mainly small-contribution loci (R² < 1%), and the total contribution to phenotype variance decreased slightly to 55.38, 67.47, and 56.63%, respectively. Therefore, major loci can be identified directly from RTM-GWAS results without recalculation using a stringent significance level.

TABLE 2

Table 2. Comparisons of the GWAS results of the seed oil traits in CSLRP among the two association analysis procedures.

TABLE 3

Table 3. The detected SNPLDBs associated with seed oil content in CSLRP.

TABLE 4

Table 4. The detected SNPLDBs associated with oleic acid and linolenic acid content in CSLRP.

FIGURE 1

Figure 1. Manhattan plots of three seed oil traits with the RTM-GWAS method. (A) Seed oil content (%), (B) oleic acid content (%) in seed oil, (C) linolenic acid content (%) in seed oil. The –log₁₀(P) was set to 20 if it was >20. The horizontal dotted lines represent the genome-wide threshold.

Among the detected QTLs of the three seed oil traits, 23, 21, and 19 large-contribution QTLs (R² ≥ 1%) explained a total of 68.77, 69.76, and 66.63% of phenotypic variation, while 27, 77, and 31 small-contribution QTLs (R² < 1%) explained a total of 13.76, 20.53, and 17.21% of phenotypic variation, respectively (Tables 3, 4, Supplementary Table 4). In the SOC QTL system, 50 QTLs were detected with each QTL contribution to phenotypic variance varied continuously and greatly from 0.26 to 10.60%, and a group of minor QTLs were detected collectively but not individually with a total contribution of 8.76%. The SOC QTL system composed of both individually and collectively detected QTLs, and similar phenomenon was also observed for OAC and LAC. This indicated that the three seed oil traits are genetically complex traits, and there are great potentials in the improvement of the three oil traits through genetic recombination. Therefore, how to converge all or most of the elite alleles through breeding procedures is to be explored.

The SOC QTLs distributed on 18 chromosomes except Gm14 and Gm19 (Supplementary Table 5). There were 1-10 QTLs located on each chromosome. Gm20 harbored 10 QTL, with a total phenotypic contribution of 25.15%, indicating its genetic importance for oil content. The OAC QTLs distributed on all 20 chromosomes, also with 1-10 QTLs located on each chromosome. Gm09 harbored 10 QTLs, explaining a total of 4.45% of phenotypic variation, while Gm04 had 5 QTLs, explaining a total of 20.46% of phenotypic variation, suggesting Gm04 being of genetic importance in determining oleic content. The LAC QTLs distributed on 17 chromosomes except Gm08, Gm19, and Gm20. There were 1-6 QTLs located on each chromosome, and Gm04 had 6 QTLs, explaining a total of 24.66% of phenotypic variation, suggesting Gm04 being of genetic importance in determining linolenic content.

The QTL-Allele Matrices of Seed Oil Traits of the CSLRP

There were 136, 283, and 154 alleles on 50, 98, and 50 QTLs of SOC, OAC, and LAC, respectively, with their allele effects estimated from RTM-GWAS (Supplementary Table 6). The QTL-allele effects in the 366 Chinese soybean landraces can be organized into 136 × 366, 283 × 366, and 154 × 366 (allele × accession) matrices which were designated as QTL-allele matrix of SOC, OAC, and LAC of CSLRP, respectively. This QTL-allele matrix contains all the genetic information of the population and in fact is a matrix of the genetic constitution of the population. Figure 2 showed a compact form of the 50 × 366 (locus × accession) SOC QTL-allele matrix of CSLRP with allele effects presented in color gradient. Each row represents the allele distribution among accessions for a QTL, while each column indicates the allele constitution of an accession over all QTLs. It can be found that no landrace contains alleles that are all with negative or positive effect on the 50 loci, and lines of high oil content have more alleles with positive effect. The population contains mainly alleles with positive effect for QTLs at the top, and the alleles with negative effect are in only several lines. In contrast, the population contains mainly alleles with negative effect for QTLs at the bottom. The OAC and LAC QTL-allele matrices showed similar patterns (Supplementary Figures 2, 3).

FIGURE 2

Figure 2. The information of seed oil content QTL-allele matrix of CSLRP. (A) Effect distribution of 136 alleles on the 50 loci for seed oil content in CSLRP. A bar represents an allele. (B) Distribution of number of alleles on the 50 loci for seed oil content (%). (C) Graphical representation of seed oil content QTL-allele matrix of CSLRP. The horizontal axis represents accessions arranged in a rising order of their oil content (%), while the vertical axis represents QTL arranged in a rising order of their positive allele frequency. Each row represents the allele distribution among accessions for a QTL, while each column indicates the allele constitution of an accession over all QTLs. Allele effects are expressed in color cells with warm colors indicating positive effects and cool colors indicating negative effects, and the color depth indicates effect size.

The QTL-allele constitution of 10 lowest and highest SOC landraces were listed in Table 5. It can be found that the two landrace groups shared same alleles on nine loci (five alleles with positive effect and four alleles with negative effect), and differentiation existed on 41 loci. The total number of alleles with positive effect in the high-SOC group was 295 (ranging from 27 to 33 with an average of 29.5 per landrace), while was 227 in the low-SOC group (ranging from 19 to 25 with an average of 22.7 per landrace). The landrace N24274 with the highest SOC of 24.1% composed of 31 and 19 alleles with positive and negative effect, respectively, while the landrace N24600 with the lowest SOC of 15.8% composed of 20 and 30 alleles with positive and negative effect, respectively. Similar phenomenon was observed for OAC and LAC (Supplementary Tables 7, 8). Therefore, for the three seed oil traits, a great potential in recombination breeding may be achieved if redundant linkage can be broken. Furthermore, some alleles with positive effect only existed in accessions with the highest SOC, while some alleles with negative effect were only in accessions with the lowest SOC. For example, the 6-th allele of locus Oil-a-20-7 with an effect of −0.9 only existed in accessions with the lowest SOC, while the 3-th allele with an effect of 0.5 only existed in accessions with the highest SOC (Table 5). Similar results were observed for OAC and LAC. This indicated that in addition to the accumulation of alleles with positive effect, emergence of elite alleles may be another way in the improvement of the seed oil traits.

TABLE 5

Table 5. QTL-allele constitution of twenty accessions with the lowest and highest seed oil content in CSLRP.

Genetic Differentiation Among Ecoregion Subpopulations in the CSLRP

The AMOVA showed that significant genetic differentiations existed among ecoregions as well as among landraces within each ecoregion for seed oil traits in the CSLRP (Supplementary Table 9). The within ecoregion variance component was much greater than the among ecoregion variance component, which implied that a great variation happened in each ecoregion after the ancient ancestors moved to different ecoregions. The whole QTL-allele matrix was then separated into six ecoregion matrices for each trait. The independence of the allele frequency distribution of detected QTL among the ecoregions was tested with Chi-square criterion, and 28, 56, and 30 QTLs showed significant differentiation among the ecoregions for SOC, OAC, and LAC, respectively (Supplementary Table 6).

It was assumed that allele with the highest frequency was the original allele of one locus while allele with lowest frequency was newly happened or mutant allele. There are six kinds of differentiation patterns (Table 6): (1) As Oil-a-03-1 and Oil-a-11-1, the first major allele was positive while negative allele was its mutant, multiple alleles existed on a locus and all the alleles disseminated to all six ecoregions or commonly shared by all ecoregions. (2) As Oil-a-15-2, the first major allele was negative while positive allele was its mutant, multiple alleles existed at a locus and all the alleles disseminated to all six ecoregions. (3) As Oil-a-03-2, only two alleles existed on this locus, one was dominant, the other was newly happened but disseminated to other ecoregions, the major allele was negative while positive was its mutant. (4) The opposite situation to the third pattern with the major allele was positive. (5) As Oil-a-18-2 and Oil-a-20-3, the first major allele was negative while positive allele was its mutant, the three major alleles disseminated across all the ecoregions, while another three alleles, as newly happened, only disseminated in some ecoregions. (6) As Oil-a-20-7, the opposite situation to the fifth pattern with the first major allele was positive.

TABLE 6

Table 6. Allele frequency distribution for large effect (R² > 3%) seed oil content QTL in CSLRP.

Based on the individual QTL differentiation among ecoregions, ecoregion matrices differentiated accordingly. The obvious differentiation appeared in the existence of specific ecoregion alleles. More than 63% alleles in each trait were found in all six ecoregions while some alleles (<5% for each trait) existed in two or even one ecoregion(s) which were considered as specific ecoregion alleles (Supplementary Figure 4, Supplementary Tables 6, 10). Among the six ecoregions, ecoregion I, and ecoregion II had more ecoregion specific alleles than other ecoregions, for example, allele “7” of Gm04_BLOCK96_9670209_9789832 existed only in ecoregion I. Recognizing ecoregion-specific alleles is of great importance in studying allele evolution and ecoregion-allele relationships.

Genomic Selection for Optimal Crosses in Recombination Breeding

Based on the detected QTL-allele matrices, the recombination potentials of all possible single crosses were estimated using with- and without-linkage prediction model, and more recombination cycles is needed for realization of the without-linkage prediction in breeding programs (Table 7). There was no large difference in predicted values between the two models, and therefore results from with-linkage prediction were mentioned in the present study. The maximum value within ecoregions for SOC and OAC could be achieved as 24.20% (ecoregion II) and 38.38% (ecoregion I), while the maximum value among different ecoregions for SOC and OAC could be achieved as 24.76% [N05283.2 (III) × N05193 (II)] and 40.30% [N23538 (I) × N23561 (II)], indicating that the crosses with parents from different ecoregions could achieve higher transgressive value for SOC and OAC. However, the minimum value within ecoregions for LAC was 2.37% [N24278 (I) × N23538 (I)], which was even less than the value among the different ecoregions. From the prediction results, 15 potential crosses within and among ecoregions for each trait were recommended for seed oil traits breeding programs (Supplementary Table 11). These predictions indicate a great potential of seed oil traits improvement through recombination breeding in CSRLP based on the genetic dissection of population using RTM-GWAS.

TABLE 7

Table 7. The predicted seed oil content, oleic acid, and linolenic acid content for all possible single crosses.

The optimal crosses for comprehensive improvement of SOC, OAC, and LAC were also estimated based on the prediction (Table 8). This was picked up from individual results of each trait. Since the individual trait prediction was based on linkage model, the comprehensive prediction should also be considered to include linkage information. Among the top 20 comprehensive high seed oil trait crosses, the best ones were from crosses between an ecoregion I parent (such as N23679 and N23538) and an ecoregion II parent (such as N09445 and N05193). It seems that the best seed oil trait parental materials are mainly located in ecoregion I with several materials from other ecoregions.

TABLE 8

Table 8. The predicted optimal crosses for seed oil traits improvement.

The Candidate Genes That Confer the Seed Oil Traits in the CSLRP

From the detected 13, 19, and 12 major QTLs of the three seed oil traits, a total of 38, 27 and 25 genes were annotated for SOC, OAC, and LAC, respectively (Supplementary Tables 12–14). These candidate genes, 28, 15, and 15 genes were selected according to chi-square test and grouped into three GO categories, i.e., biological process, cellular component, and molecular function for SOC, OAC, and LAC, respectively (Figure 3, Supplementary Figure 5).

FIGURE 3

Figure 3. Gene ontology classification of annotated genes for seed oil content.

There were two major common QTLs between OAC and LAC, and no common QTL was found for SOC (Supplementary Table 3). The SNPLDB Gm04_BLOCK96_9670209_9789832 was detected to be associated with OAC (Oleic-a-04-2) and LAC (Linolenic–a-04-2), which explained the highest phenotypic variation for both OAC (13.20%) and LAC (16.83%). In the marker region, Glyma04g11250 (Haloacid dehalogenase-like hydrolase superfamily protein) and Glyma04g11330 (Transducin/WD40 repeat-like superfamily protein) were identified as candidate genes. The SNPLDB Gm18_BLOCK530_61189911_61190505 was detected to be associated with OAC (Oleic-a-18-9) and LAC (Linolenic–a-18-4), which explained a high portion of phenotypic variation for both OAC (4.61%) and LAC (2.79%). Six candidate genes were identified within this region, including Glyma18g52590 [delta(3), delta(2)-enoyl CoA isomerase 1] which was previously reported to be essential for the beta-oxidation of unsaturated fatty acids (Geisbrecht et al., 1998).

In the present gene systems of three seed oil traits, there were eight genes on two loci shared by OAC and LAC with major contribution. It seemed that more shared genes appear between the two fatty acid traits rather than with SOC. However, SOC as a super trait should have a more genetic relationship with its fatty acid components theoretically, and therefore further studies on SOC are needed.

Discussion

The Advantages of the Innovative RTM-GWAS Procedure

Firstly, RTM-GWAS is based on multi-allelic SNPLDB marker rather than bi-allelic SNP marker used by other GWAS methods. The SNPLDB marker can match genetic locus with varied number of alleles, and detecting QTL by SNPLDB marker should be more effective and powerful than bi-allelic SNP marker for germplasm populations. Secondly, RTM-GWAS is based on an efficient two-stage association analysis, where the markers were preselected by single-locus model followed by multi-locus model stepwise regression. In multi-locus model analysis, loci are jointly fitted and tested in a joint linear model, and the experiment-wise error is controlled under the normal significance level. Therefore, no additional multiple correction is needed. This is different from GWAS methods based on single-locus model where a large number of independent statistical inferences are considered and the experiment-wise testing has to be completed through correction, such as Bonferroni correction, while the correction in fact is not a model test but an arbitrary adjustment. Since QTL detection is carried out at the second stage under multi-locus model in RTM-GWAS, the total variation explained by detected QTL will not overflow trait heritability.

Although there are already GWAS methods that implement multi-locus model, such as MLMM (Segura et al., 2013) and mrMLM (Wang et al., 2016), they are designed for bi-allelic SNP marker only and are not applicable to the multi-allelic SNPLDB marker. However, the mrMLM method based on SNP marker was also performed and 12, 25, and 9 SNPs in a total of 46 SNPs were detected with 37.16, 51.83, and 36.06% phenotype variation contribution for SOC, OAC, and LAC, respectively (Supplementary Table 15). In comparison with mrMLM, 13, 19, and 12 SNPLDBs in a total of 42 SNPLDBs were detected with 55.38, 67.47, and 56.63% of phenotype variation contribution using RTM-GWAS under the same significance level of 0.0002 (Supplementary Table 3). Among the 46 SNPs, 7 SNPs with 1.09–3.76% phenotype variation contribution were found to overlap with 7 large contribution (R² > 1%) QTLs identified by RTM-GWAS. RTM-GWAS detected the large effect loci detected by mrMLM, but it should be noted that mrMLM results were based on SNP marker while RTM-GWAS results were based on SNPLDB marker, and it might be not an exact comparison since multiple SNPs included in a SNPLDB which may provide more allele information than a single SNP. In fact, each GWAS method has its own advantages and disadvantages and fits respective purposes. That is why researchers try to improve it from different aspects. For example, some researchers aimed to find a handful of quantitative trait nucleotides (QTN) for identifying some major genes. As a quantitative trait, especially such as oil content which is the final product of a series of biological processes, is conferred by many genes or QTLs, what we concerned is how to identify the genetic system of a trait in a germplasm population, and this is different from targeting on a few loci for individual gene study. Unfortunately, our results have not reached the goal yet due to the difficulty in controlling the experimental error and there is still a relatively small part of the genetic variation to be explained.

The Detected QTL Systems of Seed Oil Traits in Comparison With Those in the Literature

The QTLs in SoyBase (http://soybase.org) were incorporated into 54, 25, and 21 conformity QTLs according to their position and precision (Supplementary Table 1). These conformity QTLs can match 37, 19, and 9 QTLs detected in this study, therefore, about 13, 79, and 29 QTLs detected in the present study for SOC, OAC, and LAC are newly discovered, respectively (Tables 3, 4). Here the QTLs of SOC in SoyBase were obtained from 30 different mapping populations (mainly recombinant inbred lines) with 49 different parental materials, those of the OAC in SoyBase were from 9 different populations with 17 different parental accessions, and those of the LAC in SoyBase were from 10 different populations with 20 different parental accessions. However, those are only making comparisons and supports, not necessarily a direct strong validation, because the previous results are not necessarily complete and exact. As the previous QTL studies on seed oil traits are still quite limited, logically to compare back with the previous results is only a check and not enough as a validation. In this case, to evaluate the present results, what we considered is how much improvement had been made for the new procedure and what the new finds of the study were. The number of overlapped QTLs is not necessary to reflect the reliability and efficiency of a new method or a new study. Obviously, the present study provided more QTL-allele information than previous linkage mapping results with more precision and less expense. One reason is that materials used in the present study are a gene reservoir of the Chinese landrace population as from which the soybean origin area covering a wide range of genetic variance, and another reason is the high efficiency of RTM-GWAS procedure which provided the detected QTLs with 82.52–90.29% contribution to the phenotypic variation.

Population Characterization Based on QTL-Allele Matrices

Based on the relatively thorough detection of QTL-alleles through RTM-GWAS procedure, the QTL-allele matrix has provided a new tool for characterizing the populations. From the QTL-allele matrix, all kinds of genetic parameters can be obtained, such as QTL number, allele number, allele frequency, allele effect, genetic diversity, etc. The genetic differentiation among populations can be detected based on individual QTL, a group of QTLs, and a subpopulation. From the changes of allele frequencies, the evolutionary relationship among alleles can be detected. If the QTL-allele matrix is linked to ecological conditions, eco-genetics knowledge can be further revealed. In the present study, we have tried to conduct the analysis, but further results are to be explored. We believe the genome-wide QTL-allele matrix may be an important tool in population genetic studies.

Approaches to Achieve Genomic Selection for Transgressive Seed Oil Traits in the CSLRP

The genetic structure in term of QTL-allele matrix of the CSLRP showed that both positive and negative alleles existed in each accession for the seed oil traits, denoting a great recombination potential. According to prediction based on the linkage model, the SOC and OAC can be achieved as high as 24.76 and 40.30% and the LAC can be achieved as low as 2.37%. In plant breeding, this is the first stage of genomic selection, selection for optimal cross. The selected crosses are potential for next genomic selection stage, selection for best progenies to realize the potential or to obtain progenies with 24.76% SOC. There are some difficulties associated with classic genomic selection based on GEBVs (Meuwissen et al., 2001) in plant breeding, i.e., the large number of segregating progenies involving extremely high genotyping cost and uncertainty of marker-breeding value relationship due to the black box procedure. According to the present study, all genome-wide 50, 98, and 50 QTLs along with their 136, 283, and 154 alleles rather than all genome-wide SNPs of progenies should be genotyped for SOC, OAC, and LAC improvement, respectively. In such case, a small marker chip can meet the requirements of genomic selection for superior progenies, or other high throughput molecular marker technologies, such as high throughput PCR, is to be further explored. Therefore, the suggested novel GS procedure based on QTL-allele matrix in plants composes of GS for optimal crosses and GS for progeny selection. The prerequisite of this novel GS strategy is the precise and thorough QTL-allele dissection in the gene reservoir.

The above example of genomic selection for optimal crosses involves only three seed oil traits (Table 8). Since plant breeding usually involves multiple traits, genomic selection for comprehensive optimal crosses can be conducted by using multiple matrices or a weighted combination of multiple matrices. Similarly, GS for progenies incorporating multiple traits can be achieved using multiple trait markers or a weighted combination of multiple trait markers.

In addition, the allele effects obtained from the RTM-GWAS were additive effects since the materials used were inbred landraces. The additive by additive interaction effect was not considered in RTM-GWAS, but it was usually not large according to the reported linkage mapping results, especially for the three oil traits. Therefore, we recommend the prediction results should be relatively relevant to breeding programs.

Candidate Major QTL/Genes of Seed Oil Traits for Further Study

In addition to using the whole QTL-allele information in genomic selection for breeding, the information revealed through major QTL/genes can be further studied for exploring the gene network of the traits. The two QTLs located on chromosome Gm04, and Gm18 along with their annotated genes are important for further study as mentioned in the above text. Furthermore, chromosome Gm20 with four major SOC QTLs explaining up to 24.07% of the phenotypic variation was the most important chromosome for SOC, on which a number of studies also reported some SOC QTL using linkage mapping and association mapping. But no major QTL was found on chromosome Gm20 for both OAC and LAC. Chromosome Gm04 was found to be important for OAC and LAC, explaining up to 19.54 and 21.28% of the phenotypic variation, respectively, but no major QTL was found for SOC. These results suggested that seed oil content and fatty acid content may be determined by different QTLs on different chromosomes. Therefore, a great potential exists for recombination between seed oil content and fatty acid content in the CSRLP.

The expressional level of the candidate genes and the pathway of the candidate genes were analyzed, but only six QTL/genes were found to be related to OAC and LAC. Gene expression analysis showed that Glyma10g01590 (Oleic-a-10-1), Glyma11g30090 (Oleic-a-11-4), Glyma13g23800 (Oleic-a-13-4), Glyma17g34450 (Oleic-a-17-5), and Glyma17g34510 (Oleic-a-17-5) were showed high expressional levels at oil accumulation stage of seed development. Pathway analysis showed that only Glyma09g17170 (Linolenic-a-09-1) was fatty acid metabolism gene. The results indicated that oil content and fatty acid content were complex traits, and the existing information is relatively limited, further studies are needed to find major seed oil QTL/genes. In addition, GWAS results of a trait in germplasm population are obtained from one-direction inference, and validation of the results finally depends on finding all the genes through experimental molecular biology. Since our results can detect many more QTLs/genes in comparison to other GWAS procedures with only a handful QTLs detected, it seems difficult to completely validate all the QTLs through gene cloning in a short period. More effort is needed to explore the gene system from the QTL system of the three seed oil traits.

Author Contributions

JG, YZ, and JH conceived and designed the experiments. YZ, HW, SM, JZ, and GX performed the field experiments. JH, YL, SY, JZ, and TZ performed the genome sequencing. YZ and JH analyzed and interpreted the results. YZ, JH, and JG drafted the manuscript and all authors contributed to the manuscript revision.

Funding

This work was supported by the National Key R & D Program for Crop Breeding in China (2017YFD0101500, 2016YFD0100304, 2017YFD0102002), the Natural Science Foundation of China (31701452, 31701447, 31671718, 31571695), the MOE 111 Project (B08025), the MOE Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT_17R55), the MOA CARS-04 program, the Jiangsu Higher Education PAPD Program, the Fundamental Research Funds for the Central Universities and the Jiangsu JCIC-MCP, the open funds of the State Key Laboratory of Crop Genetics and Germplasm Enhancement (ZW201712).

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2018.01793/full#supplementary-material

Supplementary Figure 1. Manhattan plots of genome-wide association studies of three seed oil traits using MLM method. (A) Seed oil content; (B) oleic acid content; (C) linolenic acid content. The blue horizontal line represents significance level of 0.05, and the red horizontal line represents Bonferroni-adjusted significance level of 0.05.

Supplementary Figure 2. The information of oleic acid QTL-allele matrix. (A) Effect distribution of 283 alleles on the 98 loci for the oleic acid content (%). (B) Distribution of number of alleles on the 98 loci for oleic acid content (%). (C) The graphical presentation of oleic acid QTL-allele matrix. The horizontal axis represents accessions arranged in a rising order of oleic acid content (%), while the vertical axis represents QTL arranged in a rising order of their positive allele frequency. Each row represents the allele distribution among accessions for a QTL, while each column indicates the allele constitution of an accession over all QTLs. Allele effects are expressed in color cells with warm colors indicating positive effects and cool colors indicating negative effects, and the color depth indicates effect size.

Supplementary Figure 3. The information of linolenic acid QTL-allele matrix. (A) Effect distribution of 154 alleles on the 50 loci for the linolenic acid content (%). (B) Distribution of the number of alleles on the 50 loci for linolenic acid content (%). (C) The graphical presentation of linolenic acid QTL-allele matrix. The horizontal axis represents accessions arranged in a rising order of linolenic acid content (%), while the vertical axis represents QTL arranged in a rising order of their positive allele frequency. Each row represents the allele distribution among accessions for a QTL, while each column indicates the allele constitution of an accession over all QTLs. Allele effects are expressed in color cells with warm colors indicating positive effects and cool colors indicating negative effects, and the color depth indicates effect size.

Supplementary Figure 4. The number of alleles shared among different number of ecoregions.

Supplementary Figure 5. Gene ontology classifications of annotated genes for seed (A) oleic acid and (B) linolenic acid content.

Supplementary Table 1. Reported QTLs of three seed oil traits in SoyBase (http://soybase.org).

Supplementary Table 2. Joint analysis of variance of three seed oil traits in CSLRP.

Supplementary Table 3. The detected SNPLDBs associated with three seed oil traits under a significance level of 0.0002 in CSLRP.

Supplementary Table 4. The detected small effect (R2 < 0.01) SNPLDBs associated with oleic acid and linolenic acid content in CSLRP.

Supplementary Table 5. Distribution of detected QTL of three seed oil traits in CSLRP.

Supplementary Table 6. Allele frequency distribution among six ecoregions for the detected QTL of three seed oil traits in CSLRP.

Supplementary Table 7. QTL-allele constitution of twenty accessions with the lowest and highest oleic acid content (%) for the detected oleic acid QTLs in CSLRP.

Supplementary Table 8. QTL-allele constitution of twenty accessions with the lowest and highest linolenic acid content (%) for the detected linolenic acid QTLs in CSLRP.

Supplementary Table 9. Analysis of molecular variance among six ecoregions of CSLRP based on the SNPLDBs associated with three seed oil traits.

Supplementary Table 10. The ecoregion-specific alleles on the QTLs of three seed oil traits.

Supplementary Table 11. The predicted optimal crosses of the seed oil traits within and among ecoregions.

Supplementary Table 12. The candidate genes annotated from the detected SNPLDBs of seed oil content in CSLRP.

Supplementary Table 13. The candidate genes annotated from the detected SNPLDBs of oleic acid content in CSLRP.

Supplementary Table 14. The candidate genes annotated from the detected SNPLDBs of linolenic acid content in CSLRP.

Supplementary Table 15. The detected SNPs associated with seed oil traits using mrMLM in CSLRP.

References

Bahrami, G. (2010). “Trans and other fatty acids: effects on endothelial functions,” in Fatty Acids in Health Promotion and Disease Causation, ed R. R. Watson (Urbana: AOCS Press), 3–43.

Google Scholar

Bradbury, P. J., Zhang, Z., Kroon, D. E., Casstevens, T. M., Ramdoss, Y., and Buckler, E. S. (2007). TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635. doi: 10.1093/bioinformatics/btm308

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, Y., Li, S., Wang, Z., Chang, F., Kong, J., Gai, J., et al. (2017). Identification of major quantitative trait loci for seed oil content in soybeans by combining linkage and genome-wide association mapping. Front. Plant Sci. 8:1222. doi: 10.3389/fpls.2017.01222

PubMed Abstract | CrossRef Full Text | Google Scholar

Excoffier, L., and Lischer, H. E. (2010). Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567. doi: 10.1111/j.1755-0998.2010.02847.x

PubMed Abstract | CrossRef Full Text | Google Scholar

Fang, C., Ma, Y., Wu, S., Liu, Z., Wang, Z., Yang, R., et al. (2017). Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 18:161. doi: 10.1186/s13059-017-1289-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Gai, J., Chen, L., Zhang, Y., Zhao, T., Xing, G., and Xing, H. (2012). Genome-wide genetic dissection of germplasm resources and implications for breeding by design in soybean. Breeding Sci. 61, 495–510. doi: 10.1270/jsbbs.61.495

PubMed Abstract | CrossRef Full Text | Google Scholar

Gai, J., and Wang, Y. (2001). A study on the varietal eco-regions of soybeans in China. Sci. Agric. Sin. 34, 139–145. doi: 10.3321/j.issn:0578-1752.2001.02.005

CrossRef Full Text | Google Scholar

Geisbrecht, B. V., Zhu, D., Schulz, K., Nau, K., Morrell, J. C., Geraghty, M., et al. (1998). Molecular characterization of Saccharomyces cerevisiae Delta3, Delta2-enoyl-CoA isomerase. J. Biol. Chem. 273, 33184–33191. doi: 10.1074/jbc.273.50.33184

PubMed Abstract | CrossRef Full Text | Google Scholar

Hanson, C., Robinson, H., and Comstock, R. (1956). Biometrical studies of yield in segregating populations of Korean Lespedeza. Agron. J. 48, 268–272. doi: 10.2134/agronj1956.00021962004800060008x

CrossRef Full Text | Google Scholar

He, J., Meng, S., Zhao, T., Xing, G., Yang, S., Li, Y., et al. (2017). An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor. Appl. Genet. 130, 2327–2343. doi: 10.1007/s00122-017-2962-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Hwang, E. Y., Song, Q., Jia, G., Specht, J. E., Hyten, D. L., Costa, J., et al. (2014). A genome-wide association study of seed protein and oil content in soybean. BMC Genomics 15:1. doi: 10.1186/1471-2164-15-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Hymowitz, T. (2004). “Speciation and cytogenetics,” in Soybeans: Improvement, Production, and Uses, eds H. R. Boerma and J. E. Specht (Madison, WI: American Society of Agronomy, Inc.; Crop Science Society of America, Inc.; Soil Science Society of America, Inc.), 97–136.

Google Scholar

Jonas, E., and de Koning, D. J. (2013). Does genomic selection have a future in plant breeding? Trends Biotechnol. 31, 497–504. doi: 10.1016/j.tibtech.2013.06.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Leamy, L. J., Zhang, H., Li, C., Chen, C. Y., and Song, B. H. (2017). A genome-wide association study of seed composition traits in wild soybean (Glycine soja). BMC Genomics 18:18. doi: 10.1186/s12864-016-3397-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, S., Cao, Y., He, J., Zhao, T., and Gai, J. (2017). Detecting the QTL-allele system conferring flowering date in a nested association mapping population of soybean using a novel procedure. Theor. Appl. Genet. 130, 2297–2314. doi: 10.1007/s00122-017-2960-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y., Reif, J. C., Ma, Y., Hong, H., Liu, Z., Chang, R., et al. (2015). Targeted association mapping demonstrating the complex molecular genetics of fatty acid formation in soybean. BMC Genomics 16:841. doi: 10.1186/s12864-015-2049-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y. H., Reif, J. C., Hong, H. L., Li, H. H., Liu, Z. X., Ma, Y.-S., et al. (2018). Genome-wide association mapping of QTL underlying seed oil and protein contents of a diverse panel of soybean accessions. Plant Sci. 266(Suppl. C), 95–101. doi: 10.1016/j.plantsci.2017.04.013

PubMed Abstract | CrossRef Full Text | Google Scholar

Meng, S., He, J., Zhao, T., Xing, G., Li, Y., Yang, S., et al. (2016). Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration. Theor. Appl. Genet. 129, 1557–1576. doi: 10.1007/s00122-016-2724-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Meuwissen, T. H., Hayes, B. J., and Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829. Available online at: http://www.genetics.org/content/157/4/1819

PubMed Abstract | Google Scholar

Panthee, D. R., Pantalone, V. R., and Saxton, A. M. (2006). Modifier QTL for fatty acid composition in soybean oil. Euphytica 152, 67–73. doi: 10.1007/s10681-006-9179-3

CrossRef Full Text | Google Scholar

Peleman, J. D., and van der Voort, J. R. (2003). Breeding by design. Trends Plant Sci. 8, 330–334. doi: 10.1016/S1360-1385(03)00134-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A., and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909. doi: 10.1038/ng1847

PubMed Abstract | CrossRef Full Text | Google Scholar

Pritchard, J. K., Stephens, M., and Donnelly, P. (2000a). Inference of population structure using multilocus genotype data. Genetics 155, 945–959. Available online at: http://www.genetics.org/content/155/2/945

PubMed Abstract | Google Scholar

Pritchard, J. K., Stephens, M., Rosenberg, N. A., and Donnelly, P. (2000b). Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181. doi: 10.1086/302959

PubMed Abstract | CrossRef Full Text | Google Scholar

Segura, V., Vilhjálmsson, B. J., Platt, A., Korte, A., Seren, Ü., Long, Q., et al. (2013). An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830. doi: 10.1038/ng.2314

PubMed Abstract | CrossRef Full Text | Google Scholar

Sonah, H., O'Donoughue, L., Cober, E., Rajcan, I., and Belzile, F. (2015). Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol. J. 13, 211–221. doi: 10.1111/pbi.12249

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, H. (2011). Genetic Dissection and Elite Allele Identification of Seed Traits in Soybean Cultivars Released from Huang-huai Valleys and Southern China. Master's thesis. Nanjing Agricultural University, Nanjing.

Wang, S. B., Feng, J. Y., Ren, W. L., Huang, B., Zhou, L., Wen, Y. J., et al. (2016). Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci. Rep. 6:19444. doi: 10.1038/srep19444

PubMed Abstract | CrossRef Full Text | Google Scholar

Wen, Z., Boyse, J. F., Song, Q., Cregan, P. B., and Wang, D. (2015). Genomic consequences of selection and genome-wide association mapping in soybean. BMC Genomics 16:671. doi: 10.1186/s12864-015-1872-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Wen, Z., Ding, Y., Zhao, T., and Gai, J. (2009). Genetic diversity and peculiarity of annual wild soybean (G. soja sieb. et zucc.) from various eco-regions in China. Theor. Appl. Genet. 119, 371–381. doi: 10.1007/s00122-009-1045-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Wilson, R. F. (2004). “Seed composition,” in Soybeans: Improvement, Production, and Uses, eds. H. R. Boerma and J. E. Specht (Madison, WI: American Society of Agronomy, Inc.; Crop Science Society of America, Inc.; Soil Science Society of America, Inc.), 621–677.

Google Scholar

Yu, J., Pressoir, G., Briggs, W. H., Vroh, B. I., Yamasaki, M., Doebley, J. F., et al. (2006). A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208. doi: 10.1038/ng1702

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Y., He, J., Wang, Y., Xing, G., Zhao, J., Li, Y., et al. (2015). Establishment of a 100-seed weight quantitative trait locus-allele matrix of the germplasm population for optimal recombination design in soybean breeding programmes. J. Exp. Bot. 66, 6311–6325. doi: 10.1093/jxb/erv342

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, Z., Jiang, Y., Wang, Z., Gou, Z., Lyu, J., Li, W., et al. (2015). Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotech. 33, 408–414. doi: 10.1038/nbt.3096

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhu, C., Gore, M., Buckler, E. S., and Yu, J. (2008). Status and prospects of association mapping in plants. Plant Genome 1, 5–20. doi: 10.3835/plantgenome2008.02.0089

CrossRef Full Text | Google Scholar

Keywords: soybean, seed oil content, oleic acid content, linolenic acid content, restricted two-stage multi-locus genome-wide association study (RTM-GWAS), SNP linkage disequilibrium block (SNPLDB), genetic differentiation, genomic selection for optimal cross

Citation: Zhang Y, He J, Wang H, Meng S, Xing G, Li Y, Yang S, Zhao J, Zhao T and Gai J (2018) Detecting the QTL-Allele System of Seed Oil Traits Using Multi-Locus Genome-Wide Association Analysis for Population Characterization and Optimal Cross Prediction in Soybean. Front. Plant Sci. 9:1793. doi: 10.3389/fpls.2018.01793

Received: 10 May 2018; Accepted: 19 November 2018;
Published: 05 December 2018.

Edited by:

Nicolas Rispail, Spanish National Research Council (CSIC), Spain

Reviewed by:

Hailong Ning, Northeast Agricultural University, China
Wenxin Liu, China Agricultural University, China

Copyright © 2018 Zhang, He, Wang, Meng, Xing, Li, Yang, Zhao, Zhao and Gai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Junyi Gai, c3JpQG5qYXUuZWR1LmNu

^†These authors have contributed equally to this work

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.