QTL for Main Stem Node Number and Its Response to Plant Densities in 144 Soybean FW-RILs

Although the main stem node number of soybean [Glycine max (L.) Merr. ] is an important yield-related trait, there have been limited studies on the effect of plant density on the identification of quantitative trait loci (QTL) for main stem node number (MSNN). To address this issue, here, 144 four-way recombinant inbred lines (FW-RILs) derived from Kenfeng 14, Kenfeng 15, Heinong 48, and Kenfeng 19 were used to identify QTL for MSNN with densities of 2.2 × 105 (D1) and 3 × 105 (D2) plants/ha in five environments by linkage and association studies. As a result, the linkage and association studies identified 40 and 28 QTL in D1 and D2, respectively, indicating the difference in QTL in various densities. Among these QTL, five were common in the two densities; 36 were singly identified for response to density; 12 were repeatedly identified by both response to density and phenotype of two densities. Thirty-one were repeatedly detected across various methods, densities, and environments in the linkage and association studies. Among the 24 common QTL in the linkage and association studies, 15 explained a phenotypic variation of more than 10%. Finally, Glyma.06G094400, Glyma.06G147600, Glyma.19G160800.1, and Glyma.19G161100 were predicted to be associated with MSNN. These findings will help to elucidate the genetic basis of MSNN and improve molecular assistant selection in high-yield soybean breeding.


INTRODUCTION
Since soybean was one of the important crops worldwide, it has been an ongoing aim for soybean breeders to breed high yield cultivars in order to meet increasing global demand. As a major plant architecture trait, main stem node number (MSNN) affects soybean seed yield (Yao et al., 2015;Chang et al., 2018), for it is related with seed yield characters, such as logging, number of pods per plant, and days to flowering (Chapman et al., 2003;Zhang et al., 2004;Egli, 2013). MSNN is a typical quantitative trait, and the interaction of the genotype and the environment complicates the study on genetic basis. Therefore, molecular markers are widely used to locate quantitative trait loci (QTL) to reveal the molecular mechanism of MSNN in soybean yield. To date, Soybase (https://www.soybase.org/) has listed 37 QTL for MSNN by genetic linkage analysis (Zhang et al., 2004;Chen et al., 2007;Gai et al., 2007;Li et al., 2010;Liu et al., 2011;Moongkanna et al., 2011;Yao et al., 2015) and 11 quantitative trait nucleotides (QTNs) by genome-wide association study (GWAS) (Fang et al., 2017).
Genetic linkage analysis is an effective and traditional method to identify genetic intervals that associated plant phenotypes of traits (Tanksley et al., 1992). With the further development of DNA chip technology, single nucleotide polymorphism (SNP) has been widely used in high-density genetic linkage map construction and mapping of QTL Kim et al., 2010;Akond et al., 2013;Jun et al., 2014;Lee et al., 2015). GWAS can identify QTNs in genome regions based on the density of SNP, with advantages of high detection accuracy, high throughput, low cost, and time-saving. However, high false positive ratio is its inevitable defect. More researchers supported the viewpoint that the combination of linkage and association analysis was more accurate and effective than single methods (Ott et al., 2011;Liu et al., 2018;Fang et al., 2020;. However, identifying the location of MSNN QTL for soybean using the combination of linkage analysis and GWAS analysis has not yet been discussed. Population selection is the most important foundation for the linkage and GWAS analyses to map QTL. Linkage analysis typically is based on populations derived from two parents, and its detection power is usually relatively lower because of its less genetic diversity (Zhang S. et al., 2017). GWAS analysis generally uses natural populations or germplasm resources, and the population structure problem reduces the accuracy of the results. In order to solve the problems, scientists suggested constructing a special population, such as multi-parent advanced generation inter-cross (MAGIC) (Kover et al., 2009). The great opportunity for recombination in multiple parent populations increases mapping accuracy. Abundant genetic variation improves the efficiency of the detection of QTL, and clear kinship in progenies solves the population structure problem (Cavanagh et al., 2008). Kover et al. (2009) first constructed a MAGIC population with 19 Arabidopsis thaliana parents, and proved that the MAGIC population had great advantages in the location of QTL by mapping several known QTL with high precision. Huang et al. (2012) created a wheat MAGIC population from four excellent Australian varieties and identified QTL for plant height and hectoliter weight successfully. Butrón et al. (2019) also identified QTL for resistance to Fusarium ear rot in a MAGIC maize population.
In this study, in order to identify more accurate QTL and further perform gene mining precisely, an FW-RIL derived from a four-way cross was used to identify QTL for the MSNN of two densities in five environments by the combination of linkage analysis and GWAS. This research will enrich MSNN QTL and improve the precision of gene mining, as well as reveal the molecular mechanisms of MSNN in response to density, which will subsequently lay the foundation for marker-assisted selection breeding to increase soybean yield.

Plant Materials
To construct a four-way recombinant inbred line population, four soybean varieties with different node numbers in the main stem, Kenfeng 14, Kenfeng 15, Heinong 48, and Kenfeng 19, were used as parents. In 2008, two  The parents and FW-RILs were planted in a three-row 5 × 0.7 m plot in a split block design of three replications. The main block arranged the plant densities, namely, 2.2 × 10 5 plants/ha (D1) and 3 × 10 5 plants/ha (D2). The sub-blocks were planted lines. The management procedures followed the normal production practices.
Five mature plants of the four parents and 144 four-way recombinant inbred lines (FW-RILs) were selected randomly in the middle of each row to measure MSNN before the harvest in the field for each replication. MSNN indicated the number of nodes from the cotyledonary node to the top of the main stem. The average of the three replications was used for phenotypic data analysis.

Genotyping and SNP Map Construction
Juvenile leaves were frozen in liquid nitrogen from the parents and FW-RIL plants, and then were ground into powder. Total genomic DNA was extracted with the CTAB method (Doyle et al., 1990) and eluted in 50-µl deionized water. SNP genotyping was conducted with SoySNP660K BeadChip at Beijing Boao Biotechnology Co. Ltd. A total of 109,676 SNPs were selected from 600,010 across 20 chromosomes, with minor allele frequency (MAF) > 0.05 and maximum SNP deletion locus < 20% as criteria for the screening of SNP quality, and heterozygous loci were marked as missing to better estimate marker effect. Then, the locus was selected at each 100 kb interval along each chromosome from 3 ′ -bottom to 5 ′ -bottom. 2,292 high-quality SNPs on 20 chromosomes following Mendelian segregating ratio was applied to construct linkage map by the software GAPL V1.0 (Zhang S. et al., 2017). The length of the 20 linkage groups ranged from 76.4 to 329.7 cm, and the total length was 3,539.7 cm. The markers in each linkage group ranged from 16 to 316, with an average interval distance of 4.09 cm (ranging from 1.92 to 10.93 cm).

Phenotypic Variation Analysis
The maximum, minimum, and standard deviations, skewness, and kurtosis of MSNN were calculated for each density in each environment. ANOVA was conducted with SAS V 9.2. ANOVA for single environment was carried out according to the following equation: x ijr = µ + R r + D j + RD jr + G i + GD ij + ε ijr where x ijr is the rth observation of the ith genotype under the jth density in an environment; µ is the grand mean; R r is the effect of main block r; D j is the effect of density j; RD jr is error of main block; G i is the effect of genotype i; GD ij is the interaction effect of genotype i by density j; and ε ijr is the residual error, ε ijr ∼ N(0, σ 2 ).
For multiple environments, joint ANOVA was conducted according to the following equation: where x eijr is the rth observation of the ith genotype under the jth density in eth environment; µ is the grand mean; E e is effect of eth environment; E e (R r ) is the effect of rth main block in eth environment; D j is the effect of density j; ED ej is the interaction effect of density j by environment e; E e (RD rj ) is error of main block in eth environment; G i is the effect of genotype i; GD ij is the interaction effect of genotype i by density j; GE ei is the interaction effect of genotype i by environment e; GDE eij is the interaction effect of genotype i by density j by environment e; and ε eijr is the residual error, εe ijr ∼N(0, σ 2 ).
Genotype variance, genotype × density interaction variance, and error variance were estimated via a mixed model. The heritability (h 2 ) for single environment was calculated with the following equation: The heritability (h 2 ) for multiple environments was calculated with the following equation: where h 2 is heritability; σ 2 G is the variance of genotype; σ 2 GD is the variance of genotype × density interaction; σ 2 GDE is the variance of genotype × density × environment interaction; σ 2 is the variance of error; e is the number of environments; d is the number of planting density; and r is the number of repetitions.

Response to Density Estimation
Response to density refers to the difference in node number in response to change in density (from D1 to D2). Response to density (RD) could be evaluated according to the conditional variable method (Zhu, 1995) with the following equation: where R D is the response to density; x D1 is the phenotype value under the density of D1;x D2 is the phenotype value under the density of D2; C D1D2 is the covariance between phenotypes of MSNN under the two densities; andx D1 and V D1 are the average and variance of MSNN under the density of D1, respectively.

Linkage Analysis
Based on the SNP linkage map constructed above, interval mapping (IM) and inclusive composite interval mapping (ICIM) methods were used to map the QTL for MSNN in every density and environment through the PLQ function of GAPL software V1.0 (Zhang S. et al., 2017). In order to determine the existence of QTL, the scanning step was set to 1 cm, and the likelihood of odds (LOD) threshold was set to 3. The QTL were named qlNN-chromosome-sequence number or qlRDNN-chromosome-sequence number. The QTL mapped to the same marker region were given the same sequence number. QTL mapping results were mapped on chromosomes with MapChart2.1 (https://www.wur.nl/en). For QTL for MSNN detected in one interval, QTL by density effect in each environment, i.e., the additive effect over two densities, and additive × density interaction effect, were estimated. The formulas are shown as follows: where y ijk is the kth phenotype of ith allelic genotype in jth environment, µ .. is the grand mean of all observation, µ ij is the mean of ith allelic genotype in jth environment, G i is the ith allele effect genotype of putative QTL, D j is the jth density effect, GD ij is the QTL × density interaction effect of ith allele genotype under jth density, σ 2 G is the genetic variance, σ 2 E is the variance of density effect, σ 2 GD is the variance of the QTL × density interaction effect, σ 2 p is the phenotypic variance, and g, d, and r are the numbers of allelic genotype, density, and replication. On the basis of estimated σ 2 G , σ 2 D , σ 2 GD , and σ 2 p , the phenotypic variation explanation ratio (%) of additive (PVE A ) and additive × density interaction (PVE AD ) effect were estimated by the following formula:

Genome-Wide Association Studies
The analysis of population structure was performed with the software STRUCTURE V 2.3.4. The number of subpopulations value (K) was determined with STRUCTURE HARVESTER (http://taylor0.biology.ucla.edu/structureHarvester). Linkage disequilibrium (LD) was analyzed with TASSEL 5.0. The K value was 2, and the LD was 1.63 Mb. The procedure is described in detail in a previous study . Then, the GWAS was conducted with the software mrMLM.GUI V3.0 (Zhang Y. W. et al., 2020). Five multiple locus GWAS methods, mrMLM (Wang et al., 2016), FASTmrMLM , FASTmrEMMA (Wen et al., 2018), pLARmEB , and ISIS EM-BLASSO , were used to identify significant QTL that control MSNN and its response to density. The probability P in the first step was set at 0.01 for mrMLM, FASTmrMLM, pLARmEB, ISIS EM-BLASSO, and 0.005 for FASTmrEMMA. The critical LOD score was set at 3 to determine significant QTL. The QTL were named qnNN-chromosome-sequence number or qnRDNN-chromosome-sequence number.

Candidate Gene Prediction
The QTL used to search candidate genes should satisfy the following conditions: (1) for QTL detected by linkage: should be detected in different densities, methods, or environments; explain the phenotypic variation more than 10%, and the interval length should be <600 kb; (2) for QTL detected by GWAS: should be detected in different densities, with more than two multiple locus GWAS methods, in multiple environments, or by co-location with QTL; and explain the phenotypic variation more than 10%. The Glyma.Wm82.a2.v1 gene model in Soybase (https://soybase.org/) was used to identify genes at the interval of each of the QTL (at the interval of 100 kb on either side, determined by the rate of LD decay). According to the Phytozome website (https://phytozome.jgi.doe.gov), genes highly expressed in the stem or shoot tip were selected among them. Then, the selected genes were put together to conduct pathway analysis on the Kyoto Encyclopedia of Genes and Genomes (KEGG) website (http://www.kegg.jp). Finally, the candidate genes were predicted through the results of pathway analysis combined with their homologous genes information on other crops and potential functions in GO number (https://www.ebi.ac.uk/QuickGO/) and the NCBI database (http://www.ncbi.nlm.nih.gov/).

Phenotypic Analysis
The summary of MSNN phenotype is presented in Table 1. The data showed that the node number of the parents and FW-RIL varied with density and environment. The range of FW-RIL covered the parents, which indicated strong bilateral transgressive segregation. The skewness and kurtosis values of the FW-RIL ranged from −1 to 1, and the phenotypic data displayed a typical normal distribution (Figure 1). All the characters of phenotypic variation indicated that MSNN was controlled by large-and small-effect QTL. The significant genotypic and genotype × environment interaction variance indicated a substantial genetic variation of MSNN existed among the FW-RILs and response of genotypes to environment varied among different environments. On the basis of the significant genotype × density interaction variance and genotype × density× environment interaction variance, it was implied that the MSNN response differed to the densities and that the response varied among the environments ( Table 2).
Comparing the two densities, the mean of MSNN in D2 was higher than that in D1 (Figure 2), indicating the existence of MSNN response to density. The difference in heritability among

Mapping of QTL for MSNN
In this study, 38 QTL for MSNN were detected on 18 chromosomes (except chromosome 7 and 15) with LOD value of over 3, which explained 3.44-14.93% of phenotypic variance (Figure 3, Supplementary Table 1). Among these QTL, 13 were identified in D1, seven in D2, and five in both D1 and D2. For 14 QTL underlying the response of MSNN to density, three and three were associated singly with MSNN in D1 and D2, and one of which was associated simultaneously with MSNN in D1 and D2 (Figure 4).
Among In the five environments, the total of PVE A and PVE AE varied extremely, ranging from 1.97 (in E2) to 45.98% (in E5). It was shown that the genetic basis of MSNN response to density varied in different environments (Table 4).

Candidate Gene Prediction
In this study, genes were screened based on the physical position of the five stable QTL (genome length < 600 kb and PVE > 10%) and the 27 stable one (PVE > 10%) mentioned above. In total, 549 genes were found, among which 265 were highly expressed in the stem or shoot tip. Then these genes were used to conduct pathway analysis in the KEGG database (http://www.kegg.jp).

Superiority of Using FW-RIL Population
In this study, a FW-RIL population was used for mapping, a simple mode of MAGIC (Kover et al., 2009) with four parents. It kept the advantage of the MAGIC population in abundant genetic variation. When mapping QTL, it could be applied to analyze allelic additive effects value from four parents. As long as there were differences between any two parents, QTL could be detected. For example, qlNN-1-1, qlNN-8-2, and qlNN-11-1 could not be detected in a bi-parent population derived from Kenfeng 14 × Kenfeng 15, because the allelic additive effects from the parents were approximately equivalent (Supplementary Table 1); so, the more allelic additive effect differences in the FW-RIL population, the greater the improvement made in QTL detection. Moreover, FW-RIL was an artificial population without the population structure problem, which was suitable for GWAS analysis.
Although the population size was relatively small, the combination of the linkage and GWAS analyses could improve mapping power. Furthermore, the experiment was conducted in three environments, which could compensate for the shortage in lower power in single environments. In summary, the statistical methods and multiple environment design could increase QTL detection power.

Combination of Genetic Linkage Analysis and GWAS Analysis
Both the genetic linkage and GWAS analyses were the main methods to identify genome regions related to quantitative traits. As mentioned above, research studies have already combined the two methods to conduct target trait location analysis. In soybean, the combination of the two methods was used in several traits, such as seed size and shape (Hu et al., 2013), seed protein and oil content , number of pods , and plant height (Fang et al., 2020). Similar to the MAGIC population, four parents carried multiple allelic genotypes in FW-RIL, so it could conduct linkage and association analysis Li et al., , 2020Li et al., , 2021Liu et al., 2019;Qi et al., 2020;Song et al., 2020;Tian et al., 2020;Wang et al., 2021). First, the principles of linkage and association were different: the former associated an interval (region) with a target trait, and the latter associated a position (SNP) with a target trait. Second, the genotype data used in linkage and association analysis were different: the former was based on a small number of markers, and the latter depended on the large amount of markers over the whole genome, so the combination of linkage and GWAS could increase the identification of genomic regions associated with target traits in FW-RIL. In this research, the IM and inclusive composite interval mapping methods were used in the linkage analysis, and mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, and ISIS EM-BLASSO were used in the GWASs. A total of 38 QTL were identified by linkage analysis and 81 QTL were identified by GWAS, and 24 QTL were co-located in the interval of the identified QTL. The results indicated that the difference of the two methods in statistical principles and the genetic basis could complement each other and facilitate the detection of QTL.

Density Response of MSNN
Plant density is considered to be an important factor affecting soybean yield and yield components, such as MSNN. Ikeda et al. (1994) reported that soybean yield increased as density increased because of increase in total node number, especially branch node number. In this study, the MSNN of most of the lines increased as the density increased. Some lines showed the opposite response to density increase, indicating that the expression of gene for MSNN was probably affected by the change in density. By combining linkage and GWAS analyses, 55 QTL were identified in D1, and 33 QTL were identified in D2, respectively. Only five of them were identified in both of the densities, while the rest were detected in the single density. The results showed that the genetic basis of the QTL for MSNN was significantly different in the two densities, and for the genotype, environments, densities and their interaction were all at work. Inspired by the conditional genetic effects (Zhu, 1995) based on a net-effect analysis, the effect of MSNN response to density was estimated by the removal of other factors except planting density increment. In total, 48 QTL for MSNN response to density were identified when planting

Candidate Gene Related With MSNN
It is known that only few genes were directly related to MSNN in different crops. A novel ricMT gene was highly expressed in stem nodes (Yu et al., 1998). ZmMADS3 was expressed in the stem nodes of maize, and the transgenic maize reduced the number of nodes (Heuer et al., 2001). Dt1 controlled the number of nodes in soybean by regulating stem growth habit (Bernard, 1972). Therefore, it is of great significance to explore potential candidate genes for MSNN. In this study, four among 106 genes were predicted for MSNN. Brassinosteroids are essential plant hormones with significant effect on cell proliferation and elongation. Glyma.06G147600 was annotated as protein brassinosteroid insensitive 1 (BRI1). It has been demonstrated that BRI1 is a receptor kinase that transduces steroid signals across the plasma membrane, which is likely to be the primary brassinosteroids (BR) receptor in Arabidopsis (Wang et al., 2001). Glyma.06G094400 was annotated as 14-3-3 protein epsilon. 14-3-3 proteins were highly conserved regulatory proteins, which interact with diverse target proteins in a sequence-specific and phosphorylationdependent manner (Bridges and Moorhead, 2005). They have been proved to be involved in many processes of metabolism, hormone signaling introduction, cell division, and responses to abiotic and biotic stress in plants (Chen et al., 2006;Takahashi et al., 2007;Swatek et al., 2011). 14-3-3 proteins participate in BR signal transduction by regulating the subcellular localization and activity of both BZR1 and BZR2/BES1, which are the key transcription factor of BR signal transduction (Gampala et al., 2007). Chae et al. (2016) found that 14-3-3 proteins bound to BRI1, a kind of BR-receptor kinase, and phosphorylated in a BR-dependent manner, demonstrating that 14-3-3 proteins play an important role in the BR signaling of A. thaliana. Therefore, Glyma.06G094400 and Glyma.06G147600 could play an important role in MSNN because they probably would regulate stem growth through BR signaling pathway. Glyma.19G160800 is annotated as tryptophan synthase alpha chain. Tryptophan synthase is an enzyme that catalyzes the final two steps in the biosynthesis of tryptophan, which could be converted to indole acetic acid (IAA) via the indole acetaldehyde or indole acetonitrile pathway. Glyma.19G161100 is annotated as auxin-responsive protein IAA. In other words, the two genes function in plant growth by IAA indirectly or directly. IAA is well-known for its strong effect on stimulating elongation in isolated stem segments (Yang et al., 1996), which has previously demonstrated that stem elongation strongly responded to exogenous IAA in light-grown pea (Murayama and Ueda, 1973;Yang et al., 1993). Recent research studies further showed that an auxin gradient was involved in cell proliferation in Arabidopsis and rice , and that auxin could convert to other forms to keep homeostasis to regulate soybean stem growth and development through various pathways (Jiang et al., 2020). Obviously, the two genes have a function in soybean stem growth and might have a certain relationship with MSNN. The four genes are all related with plant hormone signal transduction. It is necessary to verify the function of these genes in the future.

CONCLUSION
In this study, by combining linkage analysis and GWAS analysis, a total of 119 QTL associated with MSNN were identified in the FW-RIL population. Among them, 24 were simultaneously identified by the two methods. On the basis of the five QTL repeatedly detected in D1 and D2 and the 36 QTL for MSNN response to density, it was implied that a specific molecular mechanism controlled the MSNN response with the increase in plant density. In addition, 109 QTL were newly found, and four candidate genes were predicted to be closely related to MSNN. These genes could be of great value for MAS of soybean breeding.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
HN and W-XL: conceptualization. HZ, MS, RR, TY, HL, YH, CL, and XS: investigation. HN: resources, data curation, supervision, and funding acquisition. PW: writing-original draft preparation. BH and HN: writing-review and editing. W-XL: project administration. All authors have read and agreed to the published version of the manuscript.