Detecting QTL and Candidate Genes for Plant Height in Soybean via Linkage Analysis and GWAS

Soybean is an important global crop for edible protein and oil, and plant height is a main breeding goal which is closely related to its plant shape and yield. In this research, a high-density genetic linkage map was constructed by 1996 SNP-bin markers on the basis of a recombinant inbred line population derived from Dongnong L13 × Henong 60. A total of 33 QTL related to plant height were identified, of which five were repeatedly detected in multiple environments. In addition, a 455-germplasm population with 63,306 SNP markers was used for multi-locus association analysis. A total of 62 plant height QTN were detected, of which 26 were detected repeatedly under multiple methods. Two candidate genes, Glyma.02G133000 and Glyma.05G240600, involving in plant height were predicted by pathway analysis in the regions identified by multiple environments and backgrounds, and validated by qRT-PCR. These results enriched the soybean plant height regulatory network and contributed to molecular selection-assisted breeding.


INTRODUCTION
Soybean [Glycine max (L.) Merr.] is one of the most important crops in the world as a major source of protein and oil (Feng et al., 2021). Plant height (PH) as the main trait of soybean plant shape is related to soybean yield (Assefa et al., 2019). Low plants result in lower yields, while too high plants are prone to yield reduction due to lodging. Plant height also affects yield through other traits such as number of pods per plant and number of nodes in the main stem (Chang et al., 2018;Li M. W. et al., 2020). PH wass a complex quantitative trait which was controlled by multiple genes and influenced by environmental conditions (Lee et al., 1996).
With the objective to breeding efficiently, QTL mapping for PH were conducted by linkage and genome-wide association analysis (GWAS) analysis. Based on the bi-parent derived populations and linkage analysis (Xu et al., 2017), 238 QTLs associated with plant height had been listed on all 20 chromosomes. 1 In these researches, most of the linkage maps were constructed by low-throughput molecular markers such as restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), and simple sequence repeat (SSR) markers, which led to low marker density, large genomic region intervals for QTL localization. It was difficult to predict candidate 1 http://soybase.org genes and design marker-assisted selection for PH. With the continuous development of molecular markers, high-throughput and high-density single nucleotide polymorphism markers (SNPs) were used as major markers for linkage analysis for mapping QTL (Adewusi et al., 2017;Gomez-Casati et al., 2017;Zhang et al., 2019;Karikari et al., 2020;Tian et al., 2020;Salari et al., 2021;Silva et al., 2021). In order to construct effective linkage intervals to identify QTL, SNP bin maker technology were gradually used in construction of linkage map. Cao et al. (2019) constructed two linkage maps by 3,958 and 2,600 SNP bin markers for two RIL populations, and identified 8 and 12 PH QTL on chromosomes 2, 5,6,7,9,10,15,16,17, and 19 explaining 1.8-50.7% of the phenotypic variation, respectively. Wang et al. (2021) constructed a high-density map containing 2225 bin markers and detected 39 PH QTLs on chromosome 5, 6, 7, 9, 10, 12, 15, 16, 18, and 20, and the phenotypic variation explanation (PVE) ranged from 1.11 to 18.99 % based on a recombinant inbred line population of soybean. The second method for detecting QTL was GWAS, which has been extensively studied through recombinant inbred lines and germplasm populations of soybean (Lü et al., 2016;Qi et al., 2020;Song et al., 2020). With the objective of overcoming the shortage of false positives (Sonah et al., 2015), combinations of linkage and association analysis were gradually used in detecting genome regions controlling quantitative traits Song et al., 2020;Li X. et al., 2021). However, few studies combining both methods have been conducted on PH of soybean.
In this research, QTL/QTN localization of soybean plant height was performed via linkage analysis of a recombinant inbred lines and GWAS of a 455-germplasm population. In the region of QTL/QTN, candidate genes related to PH formation were predicted and initially validated by qRT-PCR. The objective of this research was to lay foundation for probing genetic basis and molecular assistant selection of PH.

Plant Populations
Two soybean varieties with large differences in PH, Dongnong L13 obtained from a cross between Heinong 40 and Jiujiao 5640 and Henong 60 obtained from a cross between Beifeng 11 and Hobbit, were used as parents to mate cross in 2008 in Harbin, Heilongjiang Province (E 126.63 • , N 45.75 •    FIGURE 2 | The high-density bin map of RIL6013.
Frontiers in Plant Science | www.frontiersin.org sowed in single row on the ridges with the plant space set 0.07.
The whole experiments were managed as the same as local field production. Ten plants were randomly selected from each blot to determine PH at the maturity stage. The average value of the 10 plants was used as the observation value of the plot, and finally the average value of the three blots was used for QTL and QTN mapping.

Statistical Analysis of Phenotype Data
Frequency distribution histograms were drawn from the phenotypic data of PH in each environment and descriptive statistics were performed. Analysis of variance (ANOVA) and estimation of generalized heritability were also performed as Li X. et al. (2021). The statistical model for the multi-environment ANOVA for RBRD was showed as follows: where µ is the grand average, G i is the ith genotype effect, E e is the eth environment effect, R i is the jth replication effect, B k is the kth block in jth replicate, RB jk is the interaction effect between jth replication and kth block, B k (G i ) is ith genotype in kth block, ER ej is interaction between eth environment and jth replication, EB ek is interaction between eth environment and kth block, ERB ejk is interaction effect among eth environment and jth replication and kth block, EB ek (G i ) is ith genotype under interaction of eth environment and kth block, and ε eijk is the error effect following N(0, σ 2 ). The broad-sense heritability in multiple environments was showed as following: where h 2 is the broad sense generalized heritability of average in over multiple environments,σ 2 B(G) is the variance of genotype under block,σ 2 EB(G) is the variance of genotype under environment × block interaction, σ 2 is the error variance, e is the number of environments, and r is the number of repetitions in each environment. Significance of each factor was tested by the general linear model method and variance were estimated using mixed method implemented by SAS 9.2 (SAS Institute, Cary, NC, United States).

SNP Genotyping
DNA samples extracted by CTAB method from RIL6013 were genotyped for SNPs using a soybean SNP660K microarray at Beijing Boao Biotechnology Co., Ltd. A total of 54,836 SNPs were screened on 20 chromosomes. A total of 63,306 SNPs were screened on 20 chromosomes using a soybean SNP180K microarray for SNP genotyping of 455 DNA samples from germplasm resources at Beidahuang Kenfeng Seed Co., Ltd. The obtained SNP markers were screened according to the following criteria: minimum allele frequency for markers (MAF > 5%) and maximum deletion rate <10% for each SNP (Belamkar et al., 2016).

Bin Maker Map Construction and QTL Localization
Here the SNP data from RIL6013 was used to identify possible crossovers via python 2.7snpbinner, and the minimum distance between crossovers is 0.2% of the chromosome length. The aggregated breakpoints generated from the crossover points were then used to create representative bins for the entire population (minimum distance of 30 kb per bin). The obtained bin markers were used to construct a high-density genetic linkage map of SNPbins using the.map function (Linkage map construction) in the software QTL IcimappingV 4.1 (Wang, 2009). QTL IcimappingV 4.1 (Wang, 2009) software was used to locate additive QTL using two mapping methods: interval mapping (IM-ADD) and inclusive composite interval mapping (ICIM-ADD). The scan step was set to 1.00 cM and the LOD threshold was set to 2.50. The PIN value of the ICIM-ADD method was set to 0.001. The QTL were named using the method of McCouch (1997).

Genome-Wide Association Analysis
The population structure and LD of the germplasm resource population were described and published earlier by Li X. et al. (2020). The germplasm resource population consisted of two subpopulations containing 132 (29.01%) and 323 (70.99%) lineages, respectively (K = 2). And the physical distance of LD decay was estimated as the position where r 2 dropped to half of its maximum value, the LD decay distance was estimated to be 86 kb.
Genome-wide association analysis was performed using the mrMLM.GUI package , and the six methods (mrMLM (Wang et al., 2016), FASTmrMLM (Tamba and Zhang, 2018), FASTmrEMMA (Wen et al., 2019), pLARmEB , ISIS EM-BLASSO , and pKWmEB (Ren et al., 2018) were used to detect significant QTN. In the first stage, the critical p-value parameter was set to 0.005 for all methods except FASTmrEMMA, and the critical LOD value for significant QTN was set to 3 in the final stage. The kinship matrix used in the analysis was also calculated by the software itself.

Candidate Gene Prediction
Genomic regions repeatedly identified in multiple environments or two populations were used to predict genes involving in PH formation. Specifically, the genome region of QTL interval localized in multiple environments with a genomic region less than 300 kb and the LD decay distance of 86 kb of the QTN localized within the QTL genomic region were selected, and the genes were searched for by the Phytozome website. 2 The genes 2 https://phytozome.jgi.doe.gov expressed in the stems were then screened. Finally, candidate genes related with PH were identified by combining annotation information of genes, pathway analysis in the Kyoto Encyclopedia of Genes and Genomes (KEGG) 3 and previous studies.

Candidate Gene Validation
Two parents (Dongnong L13 and Henong 60), two varieties (HN400 and HN451) with lower PH and two varieties (HN369 and HN477) with higher PH, were selected in the RIL6013 population based on the PH phenotype data. The qRT-PCR was used to study the relative expression of candidate genes in these six varieties. These varieties were planted in Harbin in the same environment as E1. Stems were sampled at 10day intervals starting from the R1 period when elongation is the fastest. The third node down from the top of the main stem was taken and replicated three times per plant. Total RNA was extracted using the OminiPlant RNA Kit (Dnase I) (CWBIO, Jiangsu, China). Two microgram of total RNA was extracted using the EasyScript R One-Step gDNA Removal and cDNA Synthesis SuperMix kit (TransGen Biotech, Beijing, China). The first strand cDNA was synthesized from 2µg of total RNA using the EasyScript R One-Step gDNA Removal

Molecular Marker Identification
With the objective to verify the effect of gene and develop markers for molecular assistant selection, the markers with polymorphism in the 100k bp interval of the genes were evaluated for the association with plant height in the 455-germplasm population.
The significant differences of averages between allelic genotypes were determined by analysis of variance, and the probability to determine the significance was set 0.05.

Phenotypic Variation Analysis
Phenotypic data collected from 139 lines of RIL6013 in eight environments were analyzed. 455 germplasm resource populations in four environments were analyzed early by Wang et al. (2021). The results of descriptive statistics ( Table 1) showed that the absolute values of kurtosis and skewness were less than 1 in all the eight environments of RIL6013 except E8, which was close to 1. It showed that PH distributed normally (Figure 1). The range of PH in RIL6013 contained those of parents, which indicated a transgressive segregation in the two populations. The coefficient of variation ranged from 13.10 to 22.00% for the RIL6013 population and from 18.03 to 20.31% for the 455germplasm population, which suggested that a wide range of variation in plant height in two populations and a different genetic basis in different environments. The results of ANOVA (Table 2) showed that there were highly significant differences in environment, genotype, and genotype × environment interaction effect, which indicated that PH was influenced not only by genotype and environment but also by genotype by environment interaction effect. Higher broad sense heritability (65 and 72%) was found in RIL6013 and 455 germplasm resource populations, respectively, which indicated that the variation of soybean plant height mainly come from genetic effect.
FIGURE 5 | Plant height and relative expression patterns of candidate genes. Plant height of six varieties at different times express as (A), relative expression of Glyma.02G133000 in six varieties at different times express as (B) and relative expression of Glyma.02G133000 in six varieties at different times express as (C). * p < 0.05, * * p < 0.01, * * * p < 0.001, * * * * p < 0.0001.

Bin Map and QTL Localization for RIL6013
A high-density SNP bin genetic linkage map covered all 20 chromosomes containing 1996 bin markers, and the total length of the map was 2874.72 cM. The number of SNP bin markers per chromosome ranged from 59 to 158, and the length of each linkage group ranged from 82.37 to 238.98 cM. The average number of markers per linkage group was 99.8, and the average distance between markers was 1.48 cM (Figure 2 and Table 3).
A total of 33 QTLs associated with plant height were localized in the RIL6013 population on 12 chromosomes of soybean using two methods IM and ICIM based on bin mapping (Figure 3 and Supplementary Table 3). The number of QTL localized on each chromosome ranged from one (Chr02, Chr04, Chr12, and Chr13) to six (Chr14), with phenotypic contributions ratio ranging from 0.55 to 13.64%. 2, 16, 6, 9, 1, 1, and 6 QTLs were localized in E2-E8, respectively. A total of three QTL (qPH-1-1, qPH-6-2, and qPH-18-4) showed phenotypic contributions ratio more than 10% and can be considered as the main effective QTL for plant height.
A total of five QTLs were localized in multiple environments (Table 4), and the additive effects were all positive, indicating that the parent Dongnong L13 could increase plant height via these QTL. qPH-2-1 was localized on chr02 in E3, E4, and E8 environments with LOD values of 2.87-3.09 and phenotypic contributions ratio of 1.60-6.46%. The genomic region of qPH-2-1 was shorter than 320kb, which is suitable for searching candidate genes.

Co-detected Regions by Linkage Analysis and Association Analysis
The regions detected by GWAS were compared with those of the linkage analysis. The results showed that two QTN loci fell within the genomic region where the two QTLs identified in the RIL6013 population (Figure 4). Among them, AX-90484715 was located within the interval of qPH-5-4 and AX-90349538 was located within the interval of qPH-14-2. Candidate genes were searched within 43 kb flanking these two QTN loci based on LD distance.

Candidate Gene Prediction
Based on the above results, candidate genes were selected to search within 13.66-13.98 Mb on Chr02, 41.55-41.63Mb on Chr05 and 4.01-4.09Mb on chr14. A total of 50 candidate genes were searched, of which 46 genes were expressed in the stems. The pathway analysis on 46 genes showed that a total of 18 genes (39.1%) had annotations (Supplementary Table 4). Based on the annotation information of KEGG and metabolic function information, three potential candidate genes were predicted that may be directly or indirectly related to PH ( Table 5).

Candidate Gene Validation
The relative expression of the three candidate genes in the two parents, HN400, HN451, HN369, and HN477, were characterized by applying qRT-PCR. The plant height of the six varieties continued to grow from R1 to day 30, with highly significant differences in plant height from day 10 after R1 (Supplementary Table 5 and Figure 5A). Relative expression amount (REA) of Glyma. 02G132200 did not differ significantly among varieties at the whole stages, which indicated Glyma. 02G132200 was not directly related to PH. It could be related to the trait from the DNA level or some other pathway. For Glyma. 02G133000, REA of the six varieties increased continuously from R1 to day 20 and started to decrease from day 20 to day 30. REA from day 10 to day 30 of HN369, HN477 and Dongnong L13 was larger significant than that of HN400, HN451 and Henong 60 ( Figure 5B). The expression of Glyma. 05G240600 in the six cultivars continued to increase from R1 to day 30. REA of HN369, HN477 and Dongnong L13 were significantly higher than that of HN400, HN451 and Henong 60 from R1 to day 30 ( Figure 5C).
From the 455-germplasm population, seven and one SNP markers associated with plant height were detected near Glyma.05G240600 and Glyma.02G133000 (Table 6), which indicated that the two genes controlling plant height. Among these markers, AX-90483488, AX-90490846, and AX-90515514 were detected in three environments, while the rest five markers were detected in only one environment. These eight markers could be used improve plant height commonly or specifically.

Improving the Accuracy of QTL Analysis and GWAS by Multi-Environment Experiments and Sufficient SNP Markers
The small amount of RFLP, AFLP, and SSR markers used in previous studies made it difficult to ensure the accuracy of linkage analysis (Singh et al., 2016;Bhat et al., 2020), and most of the previously localized QTL were analyzed in a single environment, which is prone to false positive results (Fang et al., 2020). QTL detected repeatedly in multiple environments are more authentic than those detected in a single environment (Fulton et al., 1997). Here, a high-density genetic map containing 1,966 SNP bin markers was constructed using RIL6013 with the average distance between markers of 1.48 cM, which improved the resolution of the map and facilitated the localization of more QTL and shortened the interval of localized QTL. Phenotypic variation was enriched using an eight-environment experiment at multiple locations over multiple years. And the candidate genes were searched within the stable QTL intervals that were repeatedly localized in multiple environments. Summarizing the above measures, the accuracy of the linkage analysis was improved. For association analysis, more molecular markers could produce a higher probability of detecting functional loci . Multi-locus GWAS methods are effective in reducing falsepositive QTNs compared to single-locus GWAS methods (Qi et al., 2020). An ideal germplasm resource population should contain rich genotypic and phenotypic data (Kaler et al., 2020). Based on the above considerations, the genotype data of 63,306 SNP markers from a natural 455-germplasm population and phenotypic data from four environments were used to conduct multi-locus GWAS analysis, which improve the accuracy of association analysis and reduce the ratio of false positives.

Comparison With Previous Results of Localized QTL
Here, the five QTLs located by RIL6013 repeatedly in multiple environments and the two QTNs identified by a combination of linkage and association analysis were compared with 238 QTLs associated with PH located by previous researches in the soybase database (Figure 4). The interval of qPH-2-1, which is located on chr02, was contained by the interval of Plant height 42-1 (Hu et al., 2013). The interval of qPH-3-5 on chr03 located in the interval of Plant height 26-17 (Sun et al., 2006). The interval of qPH-12-1 on chr12 had a overlapping region with the interval of Plant height 38-7 (Lee et al., 2015). The interval of qPH-18-3 on chr18 crossed the intervals of Plant height 26-13 and Plant height 26-14 (Sun et al., 2006). The genomic region of AX-90484715 on ch05 had a overlapping region with the interval of Plant height 37-1 (Yao et al., 2015). The qPH-1-1 localized on chr01 in three environments, E3, E5 and E8, was a newly identified QTL, which was more than 20 Mb away from mqPlant height-005 (Pathan et al., 2013). The AX-90349538 on chr14 was a newly identified QTN, which was more than 1.9 Mb away from Plant height 34-6 (Kim et al., 2012). In addition, compared with genes related to plant height identified in previous studies, we found that the GA20ox controlling PH formation (Fernandez et al., 2009) was only 170 kb away from AX-90464100. These results support the accuracy of this study.

Further Analysis of Candidate Genes by qRT-PCR
Using annotation information and metabolic function information we initially predicted three candidate genes that might be associated with PH. Applying qRT-PCR technique to identify the relative expression of the three candidate genes in six varieties with significant differences in plant height, it was found that two candidate genes may be associated with plant height. Among them, Glyma.02G133000 is a calcium-binding protein gene involved in calmodulin synthesis, and calmodulin is involved in regulating leaf senescence and ABA response in Arabidopsis affecting plant growth and development (Dai et al., 2018). Glyma.05G240600 is involved in the synthesis of the vesicular iron transporter protein VIT, and iron stored in the vesicles plays a crucial role in the development of plant seedlings; if iron is deficient, it leads to stunted seedlings. Thus, it was speculated to regulate of plant height (Kim et al., 2006). The relative expression of Glyma.02G133000 and Glyma.05G240600 from R1 to day 30 was higher in the high PH varieties than in the low PH varieties, so it was speculated that Glyma.02G133000 and Glyma.05G240600 may have a positive regulatory function on plant height. However, plant height is a complex quantitative trait, and the specific regulation of plant height by these two candidate genes needs to be investigated in follow-up studies.

SUMMARY
Here, five multi-environmental QTL and 26 multi-method QTN were detected by linkage analysis and association analysis, respectively, and two candidate genes associated with plant height were identified by pathway analysis and qRT-PCR validation. These results lay the foundation for marker-assisted selection.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
W-XL and HN conceived and designed the experiments. JW, BH, YJ, XH, YG, JC, YL, and JH performed the field experiments. JW and HN analyzed and interpreted the results. JW, BH, and HN drafted the manuscript. W-XL provided the laboratory conditions. All authors contributed to the manuscript revision.