Novel QTL and Meta-QTL Mapping for Major Quality Traits in Soybean

Isoflavone, protein, and oil are the most important quality traits in soybean. Since these phenotypes are typically quantitative traits, quantitative trait locus (QTL) mapping has been an efficient way to clarify their complex and unclear genetic background. However, the low-density genetic map and the absence of QTL integration limited the accurate and efficient QTL mapping in previous researches. This paper adopted a recombinant inbred lines (RIL) population derived from ‘Zhongdou27’and ‘Hefeng25’ and a high-density linkage map based on whole-genome resequencing to map novel QTL and used meta-analysis methods to integrate the stable and consentaneous QTL. The candidate genes were obtained from gene functional annotation and expression analysis based on the public database. A total of 41 QTL with a high logarithm of odd (LOD) scores were identified through composite interval mapping (CIM), including 38 novel QTL and 2 Stable QTL. A total of 660 candidate genes were predicted according to the results of the gene annotation and public transcriptome data. A total of 212 meta-QTL containing 122 stable and consentaneous QTL were mapped based on 1,034 QTL collected from previous studies. For the first time, 70 meta-QTL associated with isoflavones were mapped in this study. Meanwhile, 69 and 73 meta-QTL, respectively, related to oil and protein were obtained as well. The results promote the understanding of the biosynthesis and regulation of isoflavones, protein, and oil at molecular levels, and facilitate the construction of molecular modular for great quality traits in soybean.


INTRODUCTION
Soybean (Glycine max L. Merr.), a leguminous plant that originated from China, is one of the most important crops globally, for it is rich in isoflavones, protein, and oil (Figure 1).
Soy isoflavone, a kind of plant secondary metabolite, belongs to a group of 3-phenyl derivatives synthesized by cinnamyl-CoA. Soy isoflavone is generally classified into four main categories: aglycones (AGL), glycosides, acetylglycosides, and malonylglycosides (Sun et al., 2011;Ku et al., 2020). Each category can be further divided into three kinds as well. The AGL include daidzein (DAE), glycitein (GLE), and genistein (GEE). The glucosides (GLU) include daidzin (DA), glycitin (GL), and genistin (GE). The acetylglucosides include acetyldaidzin, acetylglycitin, and acetylgenistin, while the malonylglucosides include malonyldaidzin, malonylglycitin, and malonylgenistin (Sun et al., 2011;Chen et al., 2021). All aglycones are derived from the phenylalanine pathway and can be synthesized into glycosides by the reaction with UDP-Glucose, while glycosides can be synthesized into other two main categories by the increase of the acetyl or malonyl group. Isoflavones are referred to as phytoestrogen (Dixon, 2004), for their structure is similar to estrogen. Nowadays, this secondary metabolite has been widely applied to the clinic, for it could reduce blood pressure, prevent hormone-dependent cancers, alleviate menopausal symptoms, and has numerous other features as well. Isoflavones also play an irreplaceable role in plant disease-resistance, insect resistance, and many other types of stresses (Graham et al., 2007;Yamaguchi et al., 2010), for they are the precursors of major phytoalexins in the system of plant defense responses (Fett, 1984).
Soy protein, generally referred to as the crude protein of soybean, is divided into globulins and albumins based on the solubility patterns. The globulins, categorized as 7S vicilintype and 11S legumain-type, are the most abundant protein in soybean. Soy protein plays a vital role in human health, for it provides a good balance of essential amino acids for the human diet, and is strongly correlated with lower cholesterol levels and a reduced risk of cardiovascular diseases (Chen et al., 2021). Although breeding soybean cultivars with high protein content has been a major target for decades, the intricate and indistinct genetic background of protein accumulation in plants has hampered this process.
Soybean oil, an important quality trait-like isoflavone and protein, contains five types of fatty acids: saturated palmitic acids, stearic acids, unsaturated oleic acids, linoleic acids, and linolenic acids . In general, soybean seed is abundant in linoleic acids and linolenic acids, which are essential fatty acids (EFA) for humans. The low saturated fatty acid levels in soybean oil may reduce the risk of coronary diseases and cancer if the ratio of soybean oil in the human diet is increased (Hu et al., 1997). Thus, similar to isoflavone and soy protein, soybean oil could also play a key role in human health.
Due to the crucial importance of these quality traits in plant growth and human health, many studies investigated the accumulation of isoflavones, protein, or oil in soybean seed with final aims to account for the genetic background of these traits, as well as to breed cultivars with high isoflavone, protein, or oil contents . Previous researches suggest that all these quality traits are typically quantitative characters and depend on both environmental and genetic factors (Mao et al., 2013;Zhang et al., 2014;Pei et al., 2018). The discrepancies of these traits between RILs with high, intermediate, and low contents have been presented to be relatively consistent in different environment conditions Huang et al., 2020). Up to now, as the Soybase database (Grant et al., 2010) 1 shows that there are 261 quantitative trait locus (QTL) associated with protein content and 315 QTL associated with oil content in soybean. In addition, among these 576 QTL, 5 QTL are associated with seed oil plus protein (Chen et al., 2007). Two-hundred ninety-seven QTL related to isoflavones have been detected in soybean including 61 QTL related to seed daidzein, 68 QTL related to seed genistein, 71 QTL related to seed glycitein, and 11 QTL related to seed total isoflavone.
Identifying the number of stable QTL for these quality traits in soybean is essential to understand the genetic factors, unfortunately, there were two inadequacies in previous researches: the low-density genetic map and lack of QTL integration.
Most of the QTL related to soybean isoflavone, protein, or oil contents in previous studies were based on the lowdensity genetic maps with low throughput molecular markers, such as restriction fragment length polymorphism (RFLP), the variable number of tandem repeats (VNTR), and simple sequence repeat (SSR), which resulted in the low efficiency and accuracy of QTL mapping. Nowadays, with the development of sciences (Jiang, 2000) and DNA sequencing technologies, the QTL mapping can be accomplished based on the high throughput molecular markers and high-density genetic maps. High-throughput sequencing, a next-generation sequencing (NGS) technology, is a powerful technique to construct a highdensity genetic map based on a large-scale identification of single nucleotide polymorphisms (SNP) markers (Depristo et al., 2011), and is an efficient tool to map QTL.
Since the QTL is mostly derived from diverse populations and environments in different studies, the integration of QTL is an efficient way to obtain the stable QTL through comparing and combining the QTL from varied researches, as well as of crucial importance in understanding the complicated quantitative characters. The meta-analysis, a method to integrate data from different sources into a single study, has been used mainly by researchers in medical, social, and behavioral sciences (Borenstein et al., 2009). Goffinet and Gerber (2000) reported a technique to combine QTL mapped from multiple independent experiments, and provided a modified Akaike criterion that could be applied to determine which QTL was actually represented by the QTL acquired from different studies. Meta-QTL, the QTL integrated from multiple experiments using meta-analysis and consisting of shorter confidence intervals relative to the original QTL, could be more representative and accurate.
The aims of this research were: (1) to detect more stable QTL associated with isoflavone contents (both individual and total isoflavone), protein, and oil based on a high-density genetic map; (2) to integrate the QTL obtained from the previous researches according to meta-analysis method; (3) to predict candidate genes which may influence the accumulation of these quality traits using gene functional annotation. The results could facilitate elucidating the molecular mechanism of isoflavone, protein, and oil biosynthesis and regulation, as well as constructing molecular modular of great quality traits in soybean.

Materials and Field Experiment
To construct a recombinant inbred lines (RIL) population for QTL mapping, the single-seed descendent method (Fehr, 1991) was adopted. An F7 population with 160 lines derived from 'Zhongdou27' and 'Hefeng25' were used, which was named as the ZH RIL population. 'Hefeng25, ' a higher-yielding cultivar with an isoflavone content of 3199 µg/g, protein content of 40%, and oil content of 20.93%, was mainly planted in Northeast China and planted in Yunnan Province, Hebei Province, and Xinjiang Autonomous Region as well. 'Zhongdou27' a late-maturing cultivar with an isoflavone content of 5290 µg/g, protein content of 38.94%, and oil content of 19.94%, is mainly planted in the Huang-Huai-Hai region.
To map the stable QTL related to isoflavones, protein, and oil contents, the 160 RIL and parental lines were planted in three locations with different climates, including Harbin, Mudanjiang, and Hailun in 2020 (Figure 2). A randomized complete block design with three replications was implemented in each location. These materials were sown in rows 3 m long,.65 m wide, and with a distance of.08 m between the individual plants. Field management followed normal soybean production practices for each environmental condition.
All lines of the RIL populations were fully matured in the Harbin and Mudanjiang sites, and only a few lines were not fully matured in the Hailun experimental site, but they were all physiologically matured.

Phenotype Identification
For the measurement of protein and oil contents in soybean seed, an Infratec 1241 Analyzer (FOSS, Sweden) was used. This is a whole grain analyzer using near-infrared transmittance technology and whose major advantages are rapid, accurate, and non-destructive (Nicolai et al., 2007;Lee et al., 2010). Detailed operations can be acquired on the operating manual of Infratec 1241 Analyzer. Briefly, 80 (or more) soybean seeds were used in the measurement (repeated 10 times), and the result was the average of 10 independent experiments.
For the measurement of isoflavone contents in soybean seed, High-Performance Liquid Chromatography (HPLC) was performed.
The first step was isoflavones extraction. Around.500 g soybean powder was ground by ball milling and sifted by sifter with.002 m aperture, and then placed into a 15 ml centrifuge tube (BIOSHARP, China). The sample was extracted by ultrasonics at 60 • C for 30 min, using 9.0 ml 90% (v/v) methanol solution (FUYU, Tianjin, China) as a solvate. The resulting slurry was centrifuged at 5,000 rpm for 5 min, and the supernatant was collected by a 25 ml volumetric flask (TIANBO, Tianjin, China). Then, the sediment was extracted twice, using 6.0 ml 90% (v/v) methanol solution as a solvate. After centrifugation, all the supernatant was collected into the volumetric flask, diluted with 10% (v/v) methanol solution to volume, and mixed intensively. The extracted solution was filtered through a.45 um nylon syringe filter (BIOSHARP, China), and injected into an autosampler vial (SHIMADZU, Japan), and stored at 4 • C prior to the followup step.
The second step was the HPLC analysis. An LC-10AT HPLC (SHIMADZU, Japan) was used, which equipped a C18 column (4.6 mm × 250 mm, 5 um; AGELA TECHNOLOGIES, Tianjin, China), and controlled by a CLASS-VP V6.1 program. The gradient solution program of HPLC is presented in Table 1, and the conditions are as follows: the solvent flow rate of 1.0 ml/min, the temperature of the column at 40 • C, and the SPD-10A detector monitored eluants at 260 nm. The standards of 6 isoflavone components, i.e., DAE, GLE, GEE, DA, GL, and GE (ChromaDex, United States) were used for calculation and analysis. All these six isoflavones were accurately separated in their retention times. The individual isoflavone content was estimated by the specific value of peak area between the standards and samples. The glucoside content was calculated by summing up the contents of DA, GE, and GL, while the aglycone contents were counted by DA, GE, and GL. The sum of the six components, DAE, GLE, GEE, DA, GL, and GE was calculated in the current study, for the glucosides and aglycones are the most major compounds of isoflavones in soybean seed, which could represent the total isoflavone content though it might underestimate it a little bit.

Phenotype Statistical Analysis
For analyzing the phenotype data of ZH RIL, the ANOVA, analysis of heritability (h 2 ) and coefficient of variance (CV) were utilized. For analyzing the ANOVA and CV of the isoflavones, protein, and oil contents in soybean seed, the SPPS Statistic 24.0 (IBM, NY, United States) was selected. The statistic model and formula used in the analysis are as follows: (1) where y ijk represents the contents of these quality traits of i the genotype which is planted in j the environment under k the repeat, µ represents the mean value, G i represents the effect of i the genotype, L j represents the effect of j the environment, GE ij represents the interaction effect between i the genotype and j the environment, R jk represents the interaction effect between j the environment and k the repeat (block), and e ijk represents residual error; where σ presents the standard deviation of the quality traits, and m presents the mean value.
For the analysis of heritability in the broad sense (h 2 B ) of isoflavones, protein, and oil contents in soybean seed, the R package'lme4' (Bates et al., 2014) was utilized, which is a program used to determine the maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models. The formula used to calculate the h 2 B of each trait is as follows (Wyman and Baker, 1991): Frontiers in Plant Science | www.frontiersin.org where V G and σ 2 G refers to genetic variance, V E refers to environmental variance, σ 2 GE refers to the variance of genotype and environment interaction in the three experiment locations, σ 2 E refers to the variance of the three environment locations, n refers to the number of experiment locations, and r refers to the repetitions in each experiment location.

High-Density Genetic Map Construction
For the high-density genetic map construction, the resequencing with a high coverage level (more than 30-fold) was utilized to sequence the genome of parental lines, and the resequencing (more than 3-fold) was used to sequence the genome of RILs. The conference genome, Wm82.a2.v1, contains 995.708 Mbp of assembled and annotated sequences.
The first step was to construct the DNA libraries of these samples. Total DNA of each parental line and RIL lines was obtained from young and fully developed leaf tissues collected 20 days after emergence, by the cetyltrimethylammonium bromide (CTAB) DNA extraction method. The DNA libraries of 162 lines were sequenced on an Illumina HiSeq 2500 platform (Illumina, CA, United States) and analyzed by an Illumina Casava 2.17 (Illumina, CA, United States) (Levy and Myers, 2016). Then, pair-end sequences, the raw reads as well, with 150 bp long was acquired. The raw reads were filtrated by the quality control (QC) process, and the clean raw reads were obtained. According to the criterion of filtration, the adapter sequences, reads with lowquality (over 50% base with Phred score less than 10), and reads with more than 10% unidentified nucleotides (N) were deleted. After that, the clean reads of these samples were aligned against the reference genome Wm82.a2.v1 (Schmutz et al., 2010) 2 , by the Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009), which is a program used in comparing the short reads and reference genome. The alignments were formatted and converted into BAM files using SAMtools software , for SNP calling and genotyping.
Secondly, the Genome Analysis Toolkit (GATK) (Mckenna et al., 2010), Picard and Snpeff (Cingolani et al., 2012) was used to detect and annotate the SNP (Depristo et al., 2011) among parental lines and the reference genome, RIL and reference genome, and 2 parental lines, respectively. Detailed operations can be acquired on operating manual 3,4 . Briefly, the results aligned by BWA were used to delete replication and block the influence of PCR-duplication by Picard. GATK was used in Insert-Deletion (InDel) realignment, base recalibration, and variant calling (include InDel and SNP). The strict standards used to filtrate the SNP were as follows: the SNP cluster (the distance between 2 SNPs less than 5bp), the distance between SNP and InDel less than 5bp, and the distance of 2 InDels less than 10bp (Mckenna et al., 2010).
Finally, the sliding window method with 15 SNPs per window and 1 SNP per step were used to determine the genotype and exchange sites of each RIL and parental line. The bin markers of RILs and parental lines were determined as well. The high-density

Quantitative Trait Locus Identification, Meta-Quantitative Trait Locus Analysis, and Candidate Genes Prediction
To map the QTL of individual and total isoflavone, protein, and oil contents in soybean seed, the CIM (Zeng, 1994) method and R package 'RQTL' (Broman et al., 2003;Arends et al., 2010) were used. The threshold of LODs for declaring effective QTLs was determined using a permutation test (PT) with a significance level of p < 0.05 (n = 1000). QTL with LODs greater than 3.00 would be accepted only. The naming schemes of QTL were: q -trait name -linkage group ID -region number.
To integrate the QTL of isoflavones, protein, and oil, the meta-analysis methods (Goffinet and Gerber, 2000;Borenstein et al., 2009) and BioMercator V4.2.3 software (Arcade et al., 2004) were used. Based on the Soybase database, the information of QTL collected in this research was: QTL names, linkage groups, genetic and physical positions (in Williams 82.a2.v1), LODs, PVE, parental lines, and mapping population types. The QTL collected in this study were renamed according to the following rule: q -trait name -m -QTL number. Then, the QTL was integrated into the public composite genetic map 5 using BioMercator V4.2.3. Detailed operations can be acquired on the User Guide BioMercatorV4 6 . Then, the meta-QTL with more than 2 cm interval lengths were filtered out. Finally, the meta-QTL of isoflavones, protein, and oil were selected from four mathematic models combined with the minimum Akaike Information Criteria (AIC) (Goffinet and Gerber, 2000), and the naming schemes of meta-QTL were: q -trait name -ME -LG -QTL number.
To predict and annotate the candidate genes, the genome sequences corresponding to the QTL intervals were analyzed based on the Phytozome database 7 . Then, the sequences were used in pathway analysis with KEGG database (Kyoto Encyclopedia of Genes and Genomes 8 ), and further annotated via Basic Local Alignment Search Tool (BLASTX) (Altschul et al., 1997) with the NR database (non-redundant protein sequences 9 ) and the Clusters of Orthologous Groups of proteins (COG) database 10 . In addition, the GO database and the SwissPort database 11 were used in gene functional annotation.
To select the candidate genes, based on the RNA-Seq atlas obtained from Soybase (Grant et al., 2010;Severin et al., 2010), we analyzed the dynamic variation of candidate genes expression during the accumulation of isoflavones, protein, and oil. With the combination of gene functional annotation and public transcriptome data, the candidate genes conforming to the following characteristics were supposed to be the major genes: (1) the variation of genes expression level in seed might exhibit the bell-shaped curve, if they could promote the accumulation of these quality traits; (2) the variation of genes expression level in seed might conform to the smiling curve, if they could suppress the accumulation of these quality traits; (3) high level of genes expression during R5 to R8 should be found in the seeds rather than other tissues, if they involve in the biosynthesis of isoflavones and oil directly; (4) high level of genes expression might be found in root nodule, if they could promote the accumulation of protein directly; (5) the variation of genes expression should be ahead of the dynamic change of trait contents significantly, if they are regulated genes or participated in the upstream of the biosynthesis pathways of these traits. Meanwhile, the expression variation of reported genes in soybean seed, IFS, F3H, MYB11, CHS7, and CHS8, were used as standards to characterize the expression level of different type genes during the accumulation of isoflavones; while the genes MYB73, FAD2-1A, DGK7, and PEPC, were selected as background to describe the accumulation of soybean oil. The variation of module formed by GY1, CG-1, MYB118, and PEPC represented the dynamic variation of soybean seed oil contents (Table 2 and Figure 3).
Supplementary Table 1 and Figure 4 also showed that the variation between the contents of isoflavones in different locations was extremely significant. This consistency and variation appeared in oil and protein as well. The TIF (average value) in the three locations was: 3525 µg/g in Hailun, 3217 µg/g in Harbin, and 2684 µg/g in Mudanjiang. With the decrease of annual average temperature, the TIF was obviously on the rise. This tendency was also significant in the glucoside, aglycone, and 6 isoflavone components. However, this trend was totally opposite in the oil and protein contents, for both the protein and oil contents in Mudanjiang (protein: 39.92%; Oil: 20.59%) were higher, compared with Harbin (protein: 39.28%; Oil: 20.24%) and Hailun (protein: 37.08%; Oil: 19.94%).
According to Supplementary Figure 1, the contents of these quality traits were significantly segregated and showed a typical skewed normal distribution (Skewness =0), which conformed to the features of quantitative character. The determined lines with a higher or lower transgressive inheritance could be used to further investigate the major genes related to these quality traits, as well as breed cultivars with great quality traits. Table 3 provides an overview of the statistical analysis. The range of CV was 0.03-0.47, indicating that the variation of these phenotypes was significant and reasonable in this study, and this population was suitable for QTL mapping. The great fitting degrees (F value: 27.57-454.9) and extremely high significance (P < 0.001) of ANOVA showed that the environmental effect was extremely significant in these phenotypes; and the h 2 B (0.57-0.89) indicated that the genetic effect was also high-impact.

High-Density Genetic Map Construction
Based on the resequencing technology, a total of 561.95 Gbp clean data, with a Q30 ratio of more than 90%, of the ZH RIL population were collected, including Zhongdou27 (37.43 Gbp), Hefeng25 (27.29 Gbp), and 160 RILs (501.23 Gbp). Furthermore, a total of 1,283,813 SNP between the parental lines were detected, which included 1,093,273 SNP also detected in RIL. There were 283,855 InDel among the parental lines and 101,602.8 InDel per line between the RIL and parental lines. Finally, a total of 5,338 bin markers were obtained based on the sliding window method with these data, and a total length of 2,487.17 cM high-density genetic map with an average distance of.47 cm was constructed ( Figure 5A and Table 4). The collinearity analysis between the linkage map and reference genome showed that the Spearman correlation coefficients were almost greater than 0.99, which revealed that the linkage map was highly accurate (Figure 5B).

Quantitative Trait Loci Identification
A total of 41 QTL located on 12 linkage groups (LG) were discovered through CIM in this study, including 38 novel QTL and three reported QTL. There were 27 QTL related to isoflavones (six isoflavone components, glucoside, aglycone, and TIF), seven QTL associated with protein, four QTL linked with oil, and three QTL involved in protein plus oil ( Table 5). Two novel QTL, qDAE0403 and qGL1102, were stable, for they exhibited high LODs in all environmental conditions. Nine novel QTL, including qDA0403, qGLU0403, qAGL0403, qTIF0403, qAGL0501, qDA0502, qDAE0502, qGLU0502, and qTIF0502, were sub-stable QTL for they had high LODs in two environmental conditions. Among the QTL associated with isoflavones contents, the max LOD of QTL, the phenotypic variance explained by QTL (PVE), and the additive effects contributed by QTL ranged from 3.03 to 18.89, 0.88 to 5.28%, and -5.11 to -543.09, respectively. The properties of QTL associated with soy protein were: max LOD (2.21 to 7.34), PVE (0.41 to 2.05%), and additive effects (-0.457 to 0.377). As for the QTL associated with soybean oil, the range of max LOD, PVE, and additive effects of QTL were 2.21 to 7.34, 0.71 to 2.05, and -0.46 to 0.38, respectively. The range of LOD, PVE, and additive effects of QTL related to protein plus oil were 2.72 to 4.51, 0.92 to 1.16%, and −0.457 to 0.334, respectively. On the basis of the GO and Phytozome database, a total of 2,203 genes were obtained in this study ( Table 6). According to the pathway analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, a total of 93 pathways which involved 39 QTL (except for qPRO0902 and qGLE0903) and 830 genes were analyzed, including several pathways directly related to the biosynthesis of isoflavones (e.g., phenylalanine metabolism, flavonoid biosynthesis), oil (e.g., TCA cycle, fatty acid metabolism), and protein (e.g., tyrosine metabolism), as well as some pathways who might regulate the biosynthesis (e.g., aminoacyl-tRNA biosynthesis, basal transcription factors). The results obtained from the correlation analysis (Bastian et al., 2009) between pathways and QTL were summarized in Figure 6. Further combining with the results of annotation based on GO, SwissPort, NR, and COG database, 3,536 protein and 3,093 kinds of function related to these candidate genes were predicted. The function classification of stable and sub-stable QTL is summarized in Figure 7.
Further integrating the expression analysis with public transcriptome data, a total of 2,203 candidate genes were selected, including 143 genes related to isoflavones (6 isoflavone components, glucoside, aglycone, and TIF), 300 genes associated with protein, 152 genes involved in oil, and 60 genes linked with protein plus oil ( Table 6). The expression dynamic variation of these genes during 10 DAF to 42 DAF were conformed to our hypothesis. The stable QTL, qDAE0403, contain 21 genes; and the other one, qGL1102, contain nine genes ( Table 7). The major genes of nine sub-stable QTL and three reported QTL were predicted as well ( Table 7).

Meta-Analysis
A total of 1,034 QTL were collected from previous studies and used in meta-analysis (Figure 8 and Supplementary Table 2), including 368 QTL associated with seed oil, 316 QTL related to seed protein, and 350 QTL correlated to isoflavones. We mapped 212 meta-QTL, respectively, related to isoflavones, oil, and protein, which were distributed on all linkage groups (Supplementary Table 3), and this meta-QTL was generated from more than 1 original QTL. Of this meta-QTL, 55 QTL were located on the interval length with no more than 1 cm.
There was 7 meta-QTL associated with aglycone, which were located on the 5 LG (6th, 7th, 10th, 13th, and 17th LG) and whose AIC value varied from 7.29 to 446.06. Of these QTL, qAGLME061 derived from two original QTL had the shortest interval length (0.98 cm). In addition, only 1 meta-QTL (qTIF2ME031) linked with TIF2 and four QTL associated with TIF3 were mapped in this study.
There was five meta-QTL related to DAE with a low AIC value (13.88 to 64.92), which were mapped in the 2 LG including 4th and 5th LG. The interval length of these QTL ranged from 1.26 to 2.00 cM.
A total of 28 meta-QTL of GEE was mapped in this study, which was in 8 LG (4th, 5th, 6th, 7th, 8th, 12th, 13th, and 17th LG). Of these QTL, qGEEME121 was located in a 0.58 cM region and qGEEME124 contained a 0.61 cM confidence interval, which was the shortest interval length. In addition, the AIC of qGEEME043 (11.61) and qGEEME044 (11.61) were the lowest.
Based on 309 QTL, 69 meta-QTL related to oil contents were mapped, which were located on the 17 chromosomes (except for the 10th, 11th, and 20th chromosomes). Of these QTL, 24 QTL were located in the regions with no more than 1 cm length. Furthermore, the QTL, qOILME131, and qOILME142 contained only 0.04 cm intervals.
We obtained 73 QTL related to protein-based on metaanalysis, which was located on 19 LG (except for 14th LG). The QTL, qPROME152 contained an 0.16 cm region, and qPROME114 was in a 0.19 cm region. A total of 2 QTL (qPROME162 and qPROME163) exhibited the lowest AIC value (8.13).

Meta-Analysis
This study showed that the meta-analysis method can map stable QTL, which can offset the limitation of traditional approaches of QTL mapping: most of QTL were environmental-dependent loci which might not be effective in the environmental conditions and difficult to utilize in breeding. In this study, a total of 212 meta-QTL was obtained in this research, including 122 stable QTL. Of them, 70 meta-QTL related to isoflavones were mapped for the first time by the meta-analysis method in investigating the genetic background of isoflavones accumulation. Moreover, 69 QTL associated with oil and 73 QTL linked with protein were also acquired in this study. Several meta-QTL associated with oil or protein were mapped. The present study not only examined more previous research data and used a higher quality genetic map but also obtained more meta-QTL and more accurate intervals, compared with the previous researches. Therefore, based on the meta-analysis to integrate QTL is an efficient way to obtain the stable QTL, as well as of crucial importance in understanding the complicated quantitative characters. The mean interval length of these meta-QTLs was 1.50 cm while the length of original QTL collected from the previous studies was 11.00 cm. This indicated that the interval length of meta-QTL was obviously shorter compared with the original QTL. Moreover, of these 212 meta-QTL, 122 QTL were stable and consentaneous QTL. The original QTL was derived from two or three environments in general, making it difficult to map the stable QTL. The meta-analysis methods could remedy the limitation of the traditional methods for QTL mapping, for it could integrate the QTL based on different parental lines, diverse mapping populations, multiple environments, and various experimental designs. Thus, the meta-analysis methods and integration of QTL are powerful and assistant tools for the investigation of QTL, which could effectively promote the process of QTL mapping. There were two stable meta-QTL: qGEEME062 and qAGLME061. The qGEEME062 (112.39-113.38 cm) was derived from qGEEm01, qGEEm37, and qGEEm44 (Primomo et al., 2005;Smallwood et al., 2014;Wang et al., 2015), and involved three pairs of parental lines and 11 kinds of environmental conditions; and the qAGLME061 (112.4-113.38 cm) was mapped from qAGLm02 and qAGLm18 (Primomo et al., 2005;Smallwood et al., 2014), and was obtained from the 4 parental lines and 8 environments. Since these two QTLs were almost overlapped, this study suggested that they may contain the common major genes. Furthermore, qGEEME062 which was related to the contents of GEE, might influence the aglycone contents, for the contents of genistein makes up a large proportion of the aglycone contents. The gene functional annotation provided further evidence for the assumption that: the Glyma.06G207900, encoding a protein involved in the glucose catabolic process, may be the common major gene of these two QTL.
As same as qGEEME062 and qAGLME061, qAGLME132 (101.14-103.13 cm) and qGEEME133 (101.07-103.07 cm) were supposed to be the same QTL or (and) contained the common major genes in this research as well. The reasons are: (1) they were partly overlapped; (2) the phenotypes of these two QTL were significantly correlated. Furthermore, the gene, Glyma.13G203900, could be the common major gene of these two QTL, for it was involved in the flavonoid biosynthetic process according to the GO database. Meanwhile, the Glyma.13G202500 and Glyma.13G203800, which could influence the accumulation of anthocyanin and lignin, were also the candidates, for the anthocyanin and lignin had the common precursors of isoflavones and obstructing the anthocyanin branch pathway could promote the accumulation of isoflavones (Yu et al., 2003).
One interesting finding was that a short region on the 7th chromosome contained three relational QTL, which were qAGLME072 (34.56-36.54 cm), qGEEME071 (34.89-36.84 cm), and qGLEME071 (34.96-36.85 cm). qAGLME072 were derived from qAGLm05 (Primomo et al., 2005) and qAGLm12 (Zeng et al., 2009). qGEEME071 was obtained from 5 original QTL including qGEEm02, qGEEm03 (Primomo et al., 2005), qGEEm11 (Zeng et al., 2009), qGEEm52 , and qGEEm55 . qGLEME071 was analyzed based on 6 original QTL: qGLEm06, qGLEm07 (Primomo et al., 2005), qGLEm11 (Zeng et al., 2009), qGLEm58, qGLEm60 FIGURE 6 | The visualization of correlations between pathways and quantitative trait locus (QTL) (the black dots refer to pathways, the blue dots refer to the QTL associated with isoflavones, the gray dots present the QTL linked with protein, the yellow dots mean the QTL related to oil, and the green dots present to the QTL associated with protein plus oil). , and qGLEm61 . These original QTLs were mapped from the 11 environments and two pairs of parental lines: AC756 × RCAT Angora and Zhongdou27 × Jiunong20. This study supported that they were the same QTL, which was similar to qGEEME062 and qAGLME061. Combing with the results of gene calling and functional annotation, this research further inferred that this region may contain gene(s) directly related to the isoflavonoid biosynthesis pathway, for the candidate gene, Glyma.07G112700, located on this region could encode a protein with 4-coumarate-CoA ligase (4CL) activity. The 4CL, a key enzyme in the phenylalanine metabolic pathway (Yu et al., 2003), could directly catalyze the biosynthesis of lignin which was the precursor of aglycone isoflavones. This implied that our assumption might be true. The previous research identified a total of four 4cl genes in soybean: Gm4CL1 (on the 17th chromosome), Gm4CL2 (on the 13th chromosome), Gm4CL3 (on the 11th chromosome), and Gm4CL4 (on the 1st chromosome) (Lindermayr et al., 2002). Therefore, Glyma.07G112700 might be a new homologous gene of the Gm4CL family. Meanwhile, this research did not exclude the possibility that these QTL could affect the aglycone isoflavones indirectly. Such as Glyma.07G111700, a candidate gene in the interval, may encode a protein with protein kinase activity, which might regulate the activity of several enzymes in the isoflavone biosynthesis pathway like GmMPK1 (Wu et al., 2020).

Novel Quantitative Trait Locus Identification
Of the 41 QTL mapped in this paper, 38 QTL were novel while three (qPRO0902, qPRO1101, and qOIL1702) had been identified in previous research (Tajuddin et al., 2003;Liang et al., 2010;Lu et al., 2013). This study identified the three reported QTL with smaller interval length (0.00-0.45 cm), while the previous researches had mapped QTL with bigger interval (8.30-20.00 cm) (Hyten et al., 2004;Liang et al., 2010;Lu et al., 2013). Meanwhile, these three reported QTL also indirectly identified that the major genes of these QTL could be effective on multiple conditions, for the experimental conditions of both current and previous studies were different. Consequently, these QTL and its major genes might be greatly useful for soybean genetic improvement. Additionally, all other QTL, except qPRO0201 (12.32 cm), were also identified in the smaller interval (0.32-0.97 cm) compared with the previous researches of (1.35-101.74 cm) (Gutierrez-Gonzalez et al., 2010Leite et al., 2016;Zhou et al., 2016;Pei et al., 2018;Lee et al., 2019). The smaller interval of QTL will promote the investigation of major genes, as Liang (2021) demonstrated that the more precise and accurate interval of QTL could make QTL more closely to the quantitative trait gene (QTG) (Liang et al., 2021). Furthermore, though the several QTL mapped in the previous researches contained short interval length (less than 1 cm) as well, the QTL mapped in this paper had greater LODs (average 8.73) compared with the previous researches (2.30-8.85) (Primomo et al., 2005;Pathan et al., 2013;Wang et al., 2015;Asekova et al., 2016;Wu et al., 2020). This indicated that the QTLs of the present study were more credible. The most vital reason for this must be the application of high-density molecular markers and DNA resequencing technology in this study. The QTL identified in this study depended on a high-density genetic map constructed by 1,093,273 SNP makers, while low-density genetic maps and SSR and RFLP markers were utilized in the previous studies (Liang et al., 2010;Lu et al., 2013;Mao et al., 2013). In addition, the PVE of qDA0403, qGL1102, qDA0403, qGLU0403, qAGL0403, qTIF0403, qDA0502, qDAE0502, qGLU0502, and qTIF0502 was high and stable in various experimental locations   the total content of GEE, GE, acetylgenistin, and malonylgenistin) (Yoshikawa et al., 2010). We argued that qGL1102 and KGl_2 might work together to participate in the pathway or contain common major genes, based on the following reasons: (1) acetylgenistin and malonylgenistin derived from GL, and GLE were the precursor of GL; (2) there were few candidate genes on the non-overlapping region between qGL1102 and KGl_2. This study also supported that the major genes of and KGl_2 were regulatory genes, for many candidate genes obtained from them involved the regulation process, such as RNA splicing. The other stable QTL, qDAE0403, was located in the region of 50734438 bp and 51188701 bp on the 4th chromosome. Five QTL were also mapped in this region, including qDA0403, qGLE0403, qGLU0403, qAGL0403, and qTIF0403. With the combination of the functional annotation, this study supposed that Glyma.04G244100.1 and (or) Glyma.04G239000.1 were the common QTG of these six QTL. Because both these genes participated in the glycometabolism and encode enzymes with protein serine/threonine kinase activity (GO:0004674), according to the GO database. Although these enzymes does not participate in the isoflavones biosynthesis pathway directly, they may regulate the activity of sucrose synthase (SUS) by phosphorylating and dephosphorylating the serine (Huber et al., 1996), while the SUS could catalyze the biosynthesis of UDP-Glucose which could react with aglycones to form the glucoside isoflavones (Köster and Barz, 1981). Therefore, the candidate genes of these six QTL, Glyma.04G244100.1 and (or) Glyma.04G239000.1, could influence both the catabolism of aglycones and anabolism of glucosides by regulating the biosynthesis of the UDP-Glucose, and thus, may influence both the individual and total isoflavone contents. In addition, the Glyma.04G242200.1, Glyma.04G239000.1, and (or) Glyma.04G239000.2 could be the major genes as well, for they are involved in the process of response to auxin (GO:0009733) which is a phytohormone influencing the accumulation of isoflavones (Luczkiewicz et al., 2014).
The present study also found several QTLs linked with different traits have existed in the same regions. This demonstrated that the genes located on these QTL could play an important role in the isoflavone biosynthesis pathway, such as encoding key enzymes that participated in the biosynthesis pathway directly, encoding transcription factors controlling a key step of the pathway indirectly, or regulating an important reaction in epigenetic levels mediately.
One unanticipated finding was that both qAGL0501 and qOIL0501 was located in the same region, and both the additive effect of qAGL0501 (6.00) and qOIL0501 (0.14) was positive in Harbin, which confirmed the results of the previous research that positive correlations between oil and aglycone might be existed genetically . These QTL may contain several genes, respectively, related to isoflavones and oil contents, or contain several genes involved in both the isoflavones and oil biosynthetic pathways. Additionally, this study also found that the qAGL0501 exhibited significant LOD scores in the Hailun location, but the additive effect was -183.95 in Hailun. Based on the striking contrast between the additive effect of qAGL0501 in Hailun and Harbin, it is hypothesized that qAGL0501 might be regulated by temperature or involved in the response to low-temperature stress. The gene annotation may verify these inferences. According to the KEGG database, the qAGL0501 may participate in the plant hormone signal transduction pathway (pathway ID: ko04075) and the candidate gene (Glyma.05G241600) were involved in the pathway. Further combining with the Nr database and GO database, these genes may synthesize histidine kinase 4-like protein and involve 59 biological processes including the regulation of anthocyanin metabolic process (GO:0031537), cellular response to cold (GO:0070417), and fatty acid beta-oxidation (GO:0006635). It is well-known that the anthocyanin metabolic process is a pathway closely related to genistein biosynthesis (Yu et al., 2003), and the fatty acid beta-oxidation pathway is the key process of oil accumulation. The positive additive effect of qAGL0501 in Harbin (6.00) but the negative effect in Hailun (-183.95) might be due to the differences in cold stress response. Therefore, Glyma.05G241600 is the common major gene of qAGL0501 and qOIL0501 and is involved in three biological processes including isoflavone biosynthesis, oil accumulation, and response to lowtemperature stress.
The QTL, qPRO0902, involved any pathway according to the KEGG database. We supposed that qPRO0902 was a falsepositive result based on the following reasons: (1) qPRO0902 was not involved in any pathway according to the annotation with the KEGG database in the present study; (2) the LOD of qPRO0902 was three which did not indicate an extremely significant position; (3) the three candidate genes (Glyma.09G054500, Glyma.09G054600, and Glyma.09G054700) of qPRO0902 could not encode characterized protein, according to the BlastX. Nevertheless, the genes of qPRO0902 may contain new functions, for the annotation based on the Swissport database indicated that Glyma.09G054700 had homologous genes in mice.

Summary and Further Research Avenue
Isoflavone, soy protein, and soybean oil are momentous quality traits in soybean breeding, for they play a crucial role in human health. In this study, the high-density genetic map constructed by whole-genome resequencing was used, and the QTL with short intervals were mapped. The expression analysis based on the public transcriptome data was adopted, and several major genes has been predicted. For the first time, the meta-analysis method was used to investigate the genetic background of isoflavones. A total of 41 QTL (containing 660 genes) associated with 12 kinds of quality traits were obtained on the basis of linkage analysis between phenotypes and the high-density genetic map of ZH RIL, and a total of 212 meta-QTL were mapped based on the public genetic map and meta-analysis.
This finding could promote the understandings of the biosynthesis and regulation of isoflavones, protein, and oil more clearly at the molecular level, and facilitate the molecular breeding for great quality traits in soybean. Meanwhile, finemapping qPRO0201 using secondary mapping populations from ZH RIL, and candidate genes identified using the RNA-seq with different expression analysis (DE analysis) and weighted gene co-expression network analysis (WGCNA) could be required for future research.

DATA AVAILABILITY STATEMENT
The data presented in the study are deposited in the NCBI repository, accession numbers: PRJNA778303 and PRJNA778408.