Detecting the QTL-Allele System of Seed Oil Traits Using Multi-Locus Genome-Wide Association Analysis for Population Characterization and Optimal Cross Prediction in Soybean

Soybean is one of the world's major vegetative oil sources, while oleic acid and linolenic acid content are the major quality traits of soybean oil. The restricted two-stage multi-locus genome-wide association analysis (RTM-GWAS), characterized with error and false-positive control, has provided a potential approach for a relatively thorough detection of whole-genome QTL-alleles. The Chinese soybean landrace population (CSLRP) composed of 366 accessions was tested under four environments to identify the QTL-allele constitution of seed oil, oleic acid and linolenic acid content (SOC, OAC, and LAC). Using RTM-GWAS with 29,119 SNPLDBs (SNP linkage disequilibrium blocks) as genomic markers, 50, 98, and 50 QTLs with 136, 283, and 154 alleles (2–9 per locus) were detected, with their contribution 82.52, 90.31, and 83.86% to phenotypic variance, corresponding to their heritability 91.29, 90.97, and 90.24% for SOC, OAC, and LAC, respectively. The RTM-GWAS was shown to be more powerful and efficient than previous single-locus model GWAS procedures. For each trait, the detected QTL-alleles were organized into a QTL-allele matrix as the population genetic constitution. From which the genetic differentiation among 6 eco-populations was characterized as significant allele frequency differentiation on 28, 56, and 30 loci for the three traits, respectively. The QTL-allele matrices were also used for genomic selection for optimal crosses, which predicted transgressive potential up to 24.76, 40.30, and 2.37% for the respective traits, respectively. From the detected major QTLs, 38, 27, and 25 candidate genes were annotated for the respective traits, and two common QTL covering eight genes were identified for further study.

Breeding progress depends on the potential gene resources in germplasm, such as landraces which were developed historically by farmers. The key for utilization of required genes in the germplasm population is to explore the genetic loci and alleles underlying breeding traits and to identify superior alleles. Soybean originated in China, where the crop has been cultivated for more than 5000 years (Hymowitz, 2004). During the long history, ancient Chinese farmers have developed a great number of landraces which accumulate tremendous genetic variation, and therefore, the Chinese soybean landraces are the most important gene/germplasm reservoirs for breeding programmes (Gai et al., 2012).
Soybean seed oil traits, i.e., the oil and fatty acid content, are complex traits involving a large number of genes. At present, a number of QTLs for seed oil traits have been mapped on 20 chromosomes in soybean based on linkage mapping (Supplementary Table 1). However, linkage mapping is usually applied to a segregating population derived from two parents, and therefore it is limited in terms of allelic diversity and mapping resolution (Zhu et al., 2008). The genomewide association study (GWAS) is found to be a powerful approach to detect QTL and their multiple alleles at a relatively higher resolution, and it can also be directly applied to natural populations such as germplasm. Although a number of GWASs have been performed for soybean seed oil content (Hwang et al., 2014;Sonah et al., 2015;Wen et al., 2015;Zhou et al., 2015;Cao et al., 2017;Li et al., 2018), only few are for fatty acid composition Fang et al., 2017;Leamy et al., 2017).
The previous GWAS procedures concentrate on finding a handful of major loci, such as general linear model and mixed linear model (MLM) approaches (Pritchard et al., 2000b;Price et al., 2006;Yu et al., 2006) based on single-locus model, and even MLMM (Segura et al., 2013) and mrMLM (Wang et al., 2016) based on multi-locus model. But plant breeders are more likely interested in exploring the whole QTL-allele system for both forward selection and background control in breeding programs. Furthermore, the previous GWASs are generally based on SNP markers which involve only two alleles at one site, therefore the multi-allelic variation which widely exists in germplasm population cannot be detected.
To overcome these limitations, He et al. (2017) proposed an innovative restricted two-stage multi-locus GWAS procedure (RTM-GWAS) for a relatively thorough detection of QTL and their multiple alleles in a germplasm population. In the RTM-GWAS procedure, the tightly linked SNPs are grouped into SNP linkage disequilibrium blocks (SNPLDBs) to form genomic markers with multiple haplotypes as alleles, and then it utilizes two-stage association analysis based on a multi-locus multi-allele model for genome-wide QTL identification along with their multiple alleles. Simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous GWAS procedures, especially under large sample size and high trait heritability conditions. The RTM-GWAS procedure has been applied to identify QTL-allele system of 100-seed weight (Zhang et al., 2015) and seed isoflavone content (Meng et al., 2016) in CSLRP. More recently, Li et al. (2017) applied the RTM-GWAS procedure to a soybean nested association mapping population, and identified 139 flowering date QTLs with 496 alleles, which cover almost all QTLs detected by four other mapping procedures.
Optimal cross design and precise progeny selection are two major steps in conventional plant breeding with the former determining the potential of the latter. Peleman and van der Voort (2003) presented "Breeding by Design" concept based on QTL mapping, aiming to choose parents and design crosses for potential recombination. Meuwissen et al. (2001) proposed genomic selection (GS) as a marker-assisted selection procedure based on genome-wide SNP/markers. GS composes two links, establishing an index between required targets and SNPs/markers from a training population and then using the index in progeny selection based on its genome-wide SNP/marker information. Jonas and de Koning (2013) indicated that GS approaches from dairy cattle breeding cannot be readily applied to complex plant breeding. Therefore, following the "Breeding by Design" concept, GS based on whole-genome QTL-allele system detected from RTM-GWAS seems to be a potential approach for both optimal cross design and precise progeny selection .
In the present study, the CSLRP was used to explore QTLallele constitutions of three major seed oil traits, i.e., SOC, OAC, and LAC, in the most important soybean gene/germplasm reservoir. Accordingly, the QTL-allele matrices were established as a compact form of the population genetic structure of seed oil traits. The matrices were used to characterize the genetic differentiation among ecoregion subpopulations and to select optimal crosses for seed oil improvement in soybean breeding. Accordingly, the candidate gene system was annotated from the detected QTLs for further study on the oil trait genes.

Plant Materials and Field Experiments
A sample composed of 366 soybean landraces as representative of CSLRP was used for the present study. The sampled accessions have their origination distributed in the six soybean cultivation ecoregions in China (Gai and Wang, 2001 (Table 1). This population has been used in the establishment of QTL-allele matrix for the 100-seed weight (Zhang et al., 2015) and seed isoflavone content (Meng et al., 2016).
The materials were tested in randomized complete block design (RCBD) experiments, 0.7 m × 0.8 m hill plots with two replications at Jiangpu Experimental Station (abbreviated as JP) of Nanjing Agricultural University, Nanjing, China in 2008, 2009, and two replications at Lishui Experimental Station (abbreviated as LS), Nanjing, China in 2009. The hill plots were thinned to six seedlings per plot (Wen et al., 2009 A specimen of 20 g seeds for each replication, each accession of the CSLRP were milled with a 1095 Knifetec sample mill (FOSS Tecator, Denmark), then NIR spectroscopy analysis was performed using VECTOR22/N (BRUKER, German), and finally the SOC, OAC, LAC were converted using the calibration model developed by Wang (2011).

Statistical Analysis
A joint analysis of variance (ANOVA) was conducted for the CSLRP using PROC GLM of SAS 9.4 (SAS Institute Inc., Cary, NC, USA), in which the genotype, environment, replication, and genotype-by-environment interaction were considered to be random effects. The heritability (h 2 ) was estimated asĥ 2 = σ 2 g /(σ 2 g +σ 2 /r) for individual environments andĥ 2 =σ 2 g /[σ 2 g + σ 2 ge /s + σ 2 /(sr)] for multi-environment joint analysis, wherê σ 2 g ,σ 2 ge andσ 2 are estimated variances of genotype, genotypeby-environment interaction, and the random error, respectively, and s is the number of environments and r is the number of replications in an experiment (Hanson et al., 1956). The variance components were estimated using REML method with PROC VARCOMP of SAS 9.4. The genetic coefficient of variation was calculated asσ g /µ , where µ is the population mean.

Genotyping
The RAD-Seq (restriction site-associated DNA sequencing) was used for SNP genotyping in the present study. All the genotyping work was done at BGI Tech, Shenzhen, China. A total of 116,769 SNPs were identified after quality control and grouped into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) according to He et al. (2017), Zhang et al. (2015), and Meng et al. (2016). The SNPLDB is a segment with its SNPs linked together. The sequence of each SNPLDB/segment differentiated among the 366 landraces and formed haplotypes in the same region which were considered to be alleles on a same locus/SNPLDB.

Association Mapping
Association mapping was conducted with the innovative restricted two-stage multi-locus GWAS (RTM-GWAS) procedure . At the first stage, single-locus association test based on the simple linear model was used to eliminate redundant markers, and at the second stage, the stepwise regression was applied to build the final multi-locus model based on candidate markers pre-selected from the first stage. The top 10 eigenvectors (accounting for 86% of the total variation) of the genetic similarity coefficient matrix built on SNPLDBs were incorporated as covariates to correct for population structure.
Since the environment factor in the present study involved 3 years and two locations in a same city which did not relate to certain fixed factors, therefore, the whole set of the data rather than individual environment data were used for association mapping. The mean data set across all environments were used for association analysis with a normal significance level of 0.02 as the built-in control for experiment-wise error rate of multi-locus model. As more stringent significance levels, such as 0.0002, were also suggested in other multi-locus methods such as mrMLM. Therefore, to identify candidate genes corresponding to major QTLs, a significance level of 0.0002 was also used in RTM-GWAS.
To compare the results with the previous GWAS methods, the MLM GWAS were also performed. The population structure matrix (Q) estimated from STRUCTURE 2.2 (Pritchard et al., 2000a) and the familial relatedness matrix (K) were used jointly, and the association analysis was performed using TASSEL software (Bradbury et al., 2007).

Genetic Differentiation Analysis
The analysis of molecular variance (AMOVA) for molecular variance among ecoregions was carried out based on the whole genome SNPLDBs and the SNPLDBs associated with the seed oil traits, respectively. The Arlequin 3.5.2.2 software were used for the computations (Excoffier and Lischer, 2010). To examine the difference among the ecoregion QTL-allele matrices, the chisquared test was used to test the independence of the allele frequency distribution among ecoregions for each locus using the PROC FREQ in SAS.

Optimal Cross Prediction
All possible single crosses among the 366 accessions (66,795) were generated in silico under linkage model and independent assortment model for recombination potential of the seed oil traits . For each cross, the predicted genotypic value of seed oil traits was calculated based on 2,000 continuously inbred progenies derived from F 1 individuals and the QTL-allele matrix. The 99th percentile of a cross was used as its predicted cross value for SOC and OAC, and the 1st percentile of a cross was used as its predicted cross value for LAC.

Candidate Gene Annotation
From the detected QTL system in CSLRP, the candidate genes were annotated using the following steps: firstly, the genes located <100 kb away from the associated SNPLDBs were identified based on SoyBase (http://soybase.org); secondly, those genes   containing SNPs in the population were identified; and finally, those genes containing SNPs significantly associated with the detected SNPLDB through chi-square test were considered as candidate genes. The annotations and expression data of candidate genes were retrieved from SoyBase (http://soybase. org), and the genes were grouped into three categories, i.e., biological process, cellular component, and molecular function. The pathway analysis of candidate genes was performed based on SoyCyc Soybean Metabolic Pathway Database (https://soycyc. soybase.org).  Table 2). The whole sample was separated into six sub-samples according to ecoregions in China ( Table 1). The phenotype differences of the seed oil traits among ecoregion means were not large, but large phenotype variation within ecoregion existed, indicating abundant variation in each ecoregion. Among ecoregions, the varieties from ecoregion I (Northeast China) have relatively more SOC and OAC, but less LAC. That might be due to the enhancement of soybean improvement on seed oil traits in this region during the recent decades. Since large variation existed in each ecoregion and the soybean landraces were developed independently by local farmers, exploring QTLallele constitutions of each ecoregion may provide information on genetic improvement potential of the three oil traits.

The QTL Systems of Seed Oil Traits in the CSRLP
The comparison for the number of significantly associated markers obtained from RTM-GWAS and MLM was summarized in Table 2. If correction for multiple testing is not considered (using a normal significance level of 0.05), the MLM method showed tremendous overflowing (R 2 > 100%) of the total phenotypic contribution. But in contrast, a large amount of missing heritability in comparison with the trait heritability (especially for SOC) was observed for the MLM method under the Bonferroni-adjusted significance level (Supplementary Figure 1). However, with the RTM-GWAS method which was based on multi-locus association analysis, a total of 50, 98, and 50 SNPLDBs were detected to be significantly (under a significance level of 0.02) associated with SOC, OAC, and LAC, respectively (Figure 1). The total contribution to phenotype variance of the associated loci were 82.53, 90.29, and 83.84%, which were close to but did not exceed the trait heritability values, 91.29, 90.97, and 90.24%, respectively ( Table 2). A more stringent significance level of 0.0002 was also used in RTM-GWAS, and 13, 19, and 12 in a total of 42 SNPLDBs (two loci overlap) were identified for SOC, OAC, and LAC, respectively. As expected, the SNPLDBs detected using stringent significance level were included in the SNPLDBs under normal significance level with the same p-value order, except that two SNPLDBs for SOC were newly detected using 0.0002 (Tables 3, 4, Supplementary Table 3). The loci excluded due to stringent significance level were mainly small-contribution loci (R 2 < 1%), and the total contribution to phenotype variance decreased slightly to 55.38, 67.47, and 56.63%, respectively. Therefore, major loci can be identified directly from RTM-GWAS results without recalculation using a stringent significance level.
Among the detected QTLs of the three seed oil traits, 23, 21, and 19 large-contribution QTLs (R 2 ≥ 1%) explained a total of 68.77, 69.76, and 66.63% of phenotypic variation, while 27, 77, and 31 small-contribution QTLs (R 2 < 1%) explained a total of 13.76, 20.53, and 17.21% of phenotypic variation, respectively (Tables 3, 4, Supplementary Table 4). In the SOC QTL system, 50 QTLs were detected with each QTL contribution to phenotypic variance varied continuously and greatly from 0.26 to 10.60%, and a group of minor QTLs were detected collectively but not individually with a total contribution of 8.76%. The SOC QTL system composed of both individually and collectively detected QTLs, and similar phenomenon was also observed for OAC and LAC. This indicated that the three seed oil traits are genetically complex traits, and there are great potentials in the improvement of the three oil traits through genetic recombination. Therefore, how to converge all or most of the elite alleles through breeding procedures is to be explored.
The SOC QTLs distributed on 18 chromosomes except Gm14 and Gm19 (Supplementary Table 5). There were 1-10 QTLs located on each chromosome. Gm20 harbored 10 QTL, with a total phenotypic contribution of 25.15%, indicating its genetic importance for oil content. The OAC QTLs distributed on all 20 chromosomes, also with 1-10 QTLs located on each chromosome. Gm09 harbored 10 QTLs, explaining a total of 4.45% of phenotypic variation, while Gm04 had 5 QTLs, explaining a total of 20.46% of phenotypic variation, suggesting Gm04 being of genetic importance in determining oleic content. The LAC QTLs distributed on 17 chromosomes except Gm08, Gm19, and Gm20. There were 1-6 QTLs located on each chromosome, and Gm04 had 6 QTLs, explaining a total of 24.66% of phenotypic variation, suggesting Gm04 being of genetic importance in determining linolenic content.
The QTL-Allele Matrices of Seed Oil Traits of the CSLRP There were 136, 283, and 154 alleles on 50, 98, and 50 QTLs of SOC, OAC, and LAC, respectively, with their allele effects estimated from RTM-GWAS (Supplementary Table 6). The QTL-allele effects in the 366 Chinese soybean landraces   can be organized into 136 × 366, 283 × 366, and 154 × 366 (allele × accession) matrices which were designated as QTLallele matrix of SOC, OAC, and LAC of CSLRP, respectively. This QTL-allele matrix contains all the genetic information of the population and in fact is a matrix of the genetic constitution of the population. Figure 2 showed a compact form of the 50 × 366 (locus × accession) SOC QTL-allele matrix of CSLRP with allele effects presented in color gradient. Each row represents the allele distribution among accessions for a QTL, while each column indicates the allele constitution of an accession over all QTLs. It can be found that no landrace contains alleles that are all with negative or positive effect on the 50 loci, and lines of high oil content have more alleles with positive effect. The population contains mainly alleles with positive effect for QTLs at the top, and the alleles with negative effect are in only several lines. In contrast, the population contains mainly alleles with negative effect for QTLs at the bottom. The OAC and LAC QTL-allele matrices showed similar patterns (Supplementary Figures 2, 3).
The QTL-allele constitution of 10 lowest and highest SOC landraces were listed in Table 5. It can be found that the two landrace groups shared same alleles on nine loci (five alleles with positive effect and four alleles with negative effect), and differentiation existed on 41 loci. The total number of alleles with positive effect in the high-SOC group was 295 (ranging from 27 to 33 with an average of 29.5 per landrace), while was 227 in the low-SOC group (ranging from 19 to 25 with an average of 22.7 per landrace). The landrace N24274 with the highest SOC of 24.1% composed of 31 and 19 alleles with positive and negative effect, respectively, while the landrace N24600 with the lowest SOC of 15.8% composed of 20 and 30 alleles with positive and negative effect, respectively. Similar phenomenon was observed for OAC and LAC (Supplementary Tables 7, 8). Therefore, for the three    (Table 5). Similar results were observed for OAC and LAC. This indicated that in addition to the accumulation of alleles with positive effect, emergence of elite alleles may be another way in the improvement of the seed oil traits.

Genetic Differentiation Among Ecoregion Subpopulations in the CSLRP
The AMOVA showed that significant genetic differentiations existed among ecoregions as well as among landraces within each ecoregion for seed oil traits in the CSLRP (Supplementary Table 9). The within ecoregion variance component was much greater than the among ecoregion variance component, which implied that a great variation happened in each ecoregion after the ancient ancestors moved to different ecoregions. The whole QTL-allele matrix was then separated into six ecoregion matrices for each trait. The independence of the allele frequency distribution of detected QTL among the ecoregions was tested with Chi-square criterion, and 28, 56, and 30 QTLs showed significant differentiation among the ecoregions for SOC, OAC, and LAC, respectively (Supplementary Table 6).
It was assumed that allele with the highest frequency was the original allele of one locus while allele with lowest frequency was newly happened or mutant allele. There are six kinds of differentiation patterns (Table 6): (1) As Oil-a-03-1 and Oil-a-11-1, the first major allele was positive while negative allele was its mutant, multiple alleles existed on a locus and all the alleles disseminated to all six ecoregions or commonly shared by all ecoregions.
(2) As Oil-a-15-2, the first major allele was negative while positive allele was its mutant, multiple alleles existed at a locus and all the alleles disseminated to all six ecoregions. (3) As Oil-a-03-2, only two alleles existed on this locus, one was dominant, the other was newly happened but disseminated to other ecoregions, the major allele was negative while positive was its mutant. (4) The opposite situation to the third pattern with the major allele was positive. (5) As Oil-a-18-2 and Oil-a-20-3, the first major allele was negative while positive allele was its mutant, the three major alleles disseminated across all the ecoregions, while another three alleles, as newly happened, only disseminated in some ecoregions. (6) As Oil-a-20-7, the opposite situation to the fifth pattern with the first major allele was positive.
Based on the individual QTL differentiation among ecoregions, ecoregion matrices differentiated accordingly. The obvious differentiation appeared in the existence of specific ecoregion alleles. More than 63% alleles in each trait were found in all six ecoregions while some alleles (<5% for each trait) existed in two or even one ecoregion(s) which were considered as specific ecoregion alleles (Supplementary Figure 4,  Supplementary Tables 6, 10). Among the six ecoregions, ecoregion I, and ecoregion II had more ecoregion specific alleles than other ecoregions, for example, allele "7" of Gm04_BLOCK96_9670209_9789832 existed only in ecoregion I. Recognizing ecoregion-specific alleles is of great importance in studying allele evolution and ecoregion-allele relationships.

Genomic Selection for Optimal Crosses in Recombination Breeding
Based on the detected QTL-allele matrices, the recombination potentials of all possible single crosses were estimated using withand without-linkage prediction model, and more recombination cycles is needed for realization of the without-linkage prediction in breeding programs ( Table 7). There was no large difference in predicted values between the two models, and therefore results from with-linkage prediction were mentioned in the present study. The maximum value within ecoregions for SOC and OAC could be achieved as 24.20% (ecoregion II) and 38.38% (ecoregion I), while the maximum value among different ecoregions for SOC and OAC could be achieved as 24.76% [N05283.2 (III) × N05193 (II)] and 40.30% [N23538 (I) × N23561 (II)], indicating that the crosses with parents from different ecoregions could achieve higher transgressive value for SOC and OAC. However, the minimum value within ecoregions for LAC was 2.37% [N24278 (I) × N23538 (I)], which was even less than the value among the different ecoregions. From the prediction results, 15 potential crosses within and among ecoregions for each trait were recommended for seed oil traits breeding programs (Supplementary Table 11). These predictions indicate a great potential of seed oil traits improvement through recombination breeding in CSRLP based on the genetic dissection of population using RTM-GWAS.
The optimal crosses for comprehensive improvement of SOC, OAC, and LAC were also estimated based on the prediction ( Table 8). This was picked up from individual results of each trait. Since the individual trait prediction was based on linkage model, the comprehensive prediction should also be considered to include linkage information. Among the top 20 comprehensive high seed oil trait crosses, the best ones were from crosses between an ecoregion I parent (such as N23679 and N23538) and an ecoregion II parent (such as N09445 and N05193). It seems that the best seed oil trait parental materials are mainly located in ecoregion I with several materials from other ecoregions.

The Candidate Genes That Confer the Seed Oil Traits in the CSLRP
From the detected 13, 19, and 12 major QTLs of the three seed oil traits, a total of 38, 27 and 25 genes were annotated for SOC, OAC, and LAC, respectively (Supplementary Tables 12-14). These candidate genes, 28, 15, and 15 genes were selected according to chi-square test and grouped into three GO categories, i.e., biological process, cellular component, and molecular function for SOC, OAC, and LAC, respectively (Figure 3, Supplementary Figure 5).
In the present gene systems of three seed oil traits, there were eight genes on two loci shared by OAC and LAC with major contribution. It seemed that more shared genes appear between the two fatty acid traits rather than with SOC. However, SOC as a super trait should have a more genetic relationship with its fatty acid components theoretically, and therefore further studies on SOC are needed.

DISCUSSION
The Advantages of the Innovative RTM-GWAS Procedure Firstly, RTM-GWAS is based on multi-allelic SNPLDB marker rather than bi-allelic SNP marker used by other GWAS methods. The SNPLDB marker can match genetic locus with varied number of alleles, and detecting QTL by SNPLDB marker should be more effective and powerful than bi-allelic SNP marker for germplasm populations. Secondly, RTM-GWAS is based on an efficient two-stage association analysis, where the markers were preselected by single-locus model followed by multi-locus model stepwise regression. In multi-locus model analysis, loci are jointly fitted and tested in a joint linear model, and the experiment-wise error is controlled under the normal significance level. Therefore, no additional multiple correction is needed. This is different from GWAS methods based on single-locus model where a large number of independent statistical inferences are considered and the experiment-wise testing has to be completed through correction, such as Bonferroni correction, while the correction in fact is not a model test but an arbitrary adjustment. Since QTL detection is carried out at the second stage under multi-locus    Frontiers in Plant Science | www.frontiersin.org model in RTM-GWAS, the total variation explained by detected QTL will not overflow trait heritability. Although there are already GWAS methods that implement multi-locus model, such as MLMM (Segura et al., 2013) and mrMLM (Wang et al., 2016), they are designed for bi-allelic SNP marker only and are not applicable to the multi-allelic SNPLDB marker. However, the mrMLM method based on SNP marker was also performed and 12, 25, and 9 SNPs in a total of 46 SNPs were detected with 37.16, 51.83, and 36.06% phenotype variation contribution for SOC, OAC, and LAC, respectively (Supplementary Table 15). In comparison with mrMLM, 13, 19, and 12 SNPLDBs in a total of 42 SNPLDBs were detected with 55.38, 67.47, and 56.63% of phenotype variation contribution using RTM-GWAS under the same significance level of 0.0002 (Supplementary Table 3). Among the 46 SNPs, 7 SNPs with 1.09-3.76% phenotype variation contribution were found to overlap with 7 large contribution (R 2 > 1%) QTLs identified by RTM-GWAS. RTM-GWAS detected the large effect loci detected by mrMLM, but it should be noted that mrMLM results were based on SNP marker while RTM-GWAS results were based on SNPLDB marker, and it might be not an exact comparison since multiple SNPs included in a SNPLDB which may provide more allele information than a single SNP. In fact, each GWAS method has its own advantages and disadvantages and fits respective purposes. That is why researchers try to improve it from different aspects. For example, some researchers aimed to find a handful of quantitative trait nucleotides (QTN) for identifying some major genes. As a quantitative trait, especially such as oil content which is the final product of a series of biological processes, is conferred by many genes or QTLs, what we concerned is how to identify the genetic system of a trait in a germplasm population, and this is different from targeting on a few loci for individual gene study. Unfortunately, our results have not reached the goal yet due to the difficulty in controlling the experimental error and there is still a relatively small part of the genetic variation to be explained. Obviously, the present study provided more QTL-allele information than previous linkage mapping results with more precision and less expense. One reason is that materials used in the present study are a gene reservoir of the Chinese landrace population as from which the soybean origin area covering a wide range of genetic variance, and another reason is the high efficiency of RTM-GWAS procedure which provided the detected QTLs with 82.52-90.29% contribution to the phenotypic variation.

Population Characterization Based on QTL-Allele Matrices
Based on the relatively thorough detection of QTL-alleles through RTM-GWAS procedure, the QTL-allele matrix has provided a new tool for characterizing the populations. From the QTL-allele matrix, all kinds of genetic parameters can be obtained, such as QTL number, allele number, allele frequency, allele effect, genetic diversity, etc. The genetic differentiation among populations can be detected based on individual QTL, a group of QTLs, and a subpopulation. From the changes of allele frequencies, the evolutionary relationship among alleles can be detected. If the QTL-allele matrix is linked to ecological conditions, eco-genetics knowledge can be further revealed. In the present study, we have tried to conduct the analysis, but further results are to be explored. We believe the genome-wide QTL-allele matrix may be an important tool in population genetic studies.

Approaches to Achieve Genomic Selection for Transgressive Seed Oil Traits in the CSLRP
The genetic structure in term of QTL-allele matrix of the CSLRP showed that both positive and negative alleles existed in each accession for the seed oil traits, denoting a great recombination potential. According to prediction based on the linkage model, the SOC and OAC can be achieved as high as 24.76 and 40.30% and the LAC can be achieved as low as 2.37%. In plant breeding, this is the first stage of genomic selection, selection for optimal cross. The selected crosses are potential for next genomic selection stage, selection for best progenies to realize the potential or to obtain progenies with 24.76% SOC. There are some difficulties associated with classic genomic selection based on GEBVs (Meuwissen et al., 2001) in plant breeding, i.e., the large number of segregating progenies involving extremely high genotyping cost and uncertainty of marker-breeding value relationship due to the black box procedure. According to the present study, all genome-wide 50, 98, and 50 QTLs along with their 136, 283, and 154 alleles rather than all genomewide SNPs of progenies should be genotyped for SOC, OAC, and LAC improvement, respectively. In such case, a small marker chip can meet the requirements of genomic selection for superior progenies, or other high throughput molecular marker technologies, such as high throughput PCR, is to be further explored. Therefore, the suggested novel GS procedure based on QTL-allele matrix in plants composes of GS for optimal crosses and GS for progeny selection. The prerequisite of this novel GS strategy is the precise and thorough QTL-allele dissection in the gene reservoir. The above example of genomic selection for optimal crosses involves only three seed oil traits (Table 8). Since plant breeding usually involves multiple traits, genomic selection for comprehensive optimal crosses can be conducted by using multiple matrices or a weighted combination of multiple matrices. Similarly, GS for progenies incorporating multiple traits can be achieved using multiple trait markers or a weighted combination of multiple trait markers.
In addition, the allele effects obtained from the RTM-GWAS were additive effects since the materials used were inbred landraces. The additive by additive interaction effect was not considered in RTM-GWAS, but it was usually not large according to the reported linkage mapping results, especially for the three oil traits. Therefore, we recommend the prediction results should be relatively relevant to breeding programs.

Candidate Major QTL/Genes of Seed Oil Traits for Further Study
In addition to using the whole QTL-allele information in genomic selection for breeding, the information revealed through major QTL/genes can be further studied for exploring the gene network of the traits. The two QTLs located on chromosome Gm04, and Gm18 along with their annotated genes are important for further study as mentioned in the above text. Furthermore, chromosome Gm20 with four major SOC QTLs explaining up to 24.07% of the phenotypic variation was the most important chromosome for SOC, on which a number of studies also reported some SOC QTL using linkage mapping and association mapping. But no major QTL was found on chromosome Gm20 for both OAC and LAC. Chromosome Gm04 was found to be important for OAC and LAC, explaining up to 19.54 and 21.28% of the phenotypic variation, respectively, but no major QTL was found for SOC. These results suggested that seed oil content and fatty acid content may be determined by different QTLs on different chromosomes. Therefore, a great potential exists for recombination between seed oil content and fatty acid content in the CSRLP.
The expressional level of the candidate genes and the pathway of the candidate genes were analyzed, but only six QTL/genes were found to be related to OAC and LAC. Gene expression analysis showed that Glyma10g01590 (Oleic-a-10-1), Glyma11g30090 (Oleic-a-11-4), Glyma13g23800 (Oleic-a-13-4), Glyma17g34450 (Oleic-a-17-5), and Glyma17g34510 (Oleic-a-17-5) were showed high expressional levels at oil accumulation stage of seed development. Pathway analysis showed that only Glyma09g17170 (Linolenic-a-09-1) was fatty acid metabolism gene. The results indicated that oil content and fatty acid content were complex traits, and the existing information is relatively limited, further studies are needed to find major seed oil QTL/genes. In addition, GWAS results of a trait in germplasm population are obtained from one-direction inference, and validation of the results finally depends on finding all the genes through experimental molecular biology. Since our results can detect many more QTLs/genes in comparison to other GWAS procedures with only a handful QTLs detected, it seems difficult to completely validate all the QTLs through gene cloning in a short period. More effort is needed to explore the gene system from the QTL system of the three seed oil traits.     Table 7 | QTL-allele constitution of twenty accessions with the lowest and highest oleic acid content (%) for the detected oleic acid QTLs in CSLRP.

AUTHOR CONTRIBUTIONS
Supplementary Table 8 | QTL-allele constitution of twenty accessions with the lowest and highest linolenic acid content (%) for the detected linolenic acid QTLs in CSLRP. Table 9 | Analysis of molecular variance among six ecoregions of CSLRP based on the SNPLDBs associated with three seed oil traits.

Supplementary
Supplementary Table 10 | The ecoregion-specific alleles on the QTLs of three seed oil traits.
Supplementary Table 11 | The predicted optimal crosses of the seed oil traits within and among ecoregions.