Identification and Validation of Major QTLs, Epistatic Interactions, and Candidate Genes for Soybean Seed Shape and Weight Using Two Related RIL Populations

Understanding the genetic mechanism underlying seed size, shape, and weight is essential for enhancing soybean cultivars. High-density genetic maps of two recombinant inbred line (RIL) populations, LM6 and ZM6, were evaluated across multiple environments to identify and validate M-QTLs as well as identify candidate genes behind major and stable quantitative trait loci (QTLs). A total of 239 and 43 M-QTLs were mapped by composite interval mapping (CIM) and mixed-model-based composite interval mapping (MCIM) approaches, from which 180 and 18, respectively, are novel QTLs. Twenty-two QTLs including four novel major QTLs were validated in the two RIL populations across multiple environments. Moreover, 18 QTLs showed significant AE effects, and 40 pairwise of the identified QTLs exhibited digenic epistatic effects. Thirty-four QTLs associated with seed flatness index (FI) were identified and reported here for the first time. Seven QTL clusters comprising several QTLs for seed size, shape, and weight on genomic regions of chromosomes 3, 4, 5, 7, 9, 17, and 19 were identified. Gene annotations, gene ontology (GO) enrichment, and RNA-seq analyses of the genomic regions of those seven QTL clusters identified 47 candidate genes for seed-related traits. These genes are highly expressed in seed-related tissues and nodules, which might be deemed as potential candidate genes regulating the seed size, weight, and shape traits in soybean. This study provides detailed information on the genetic basis of the studied traits and candidate genes that could be efficiently implemented by soybean breeders for fine mapping and gene cloning, and for marker-assisted selection (MAS) targeted at improving these traits individually or concurrently.


INTRODUCTION
Soybean (Glycine max L. Merr.) is one of the most important food crops, being a rich source of dietary protein (69%) and providing over 50% edible oil globally, and has a significant role in health and biofuel (Hoeck et al., 2003). Besides improving soil fertility by integrating atmospheric nitrogen in the soil through a synergistic interaction with microorganisms, because of its high nutritional value, soybean is used in human food and animal feed . Throughout the last five decades, soybean production in China slightly increased. To meet domestic demands, China imports almost 80% of its requirements of soybean; therefore, improving soybean production is a major aim of soybean breeders to make the country self-sufficient (Liu et al., 2018). Most plant breeders are targeting yield-related traits to improve soybean production.
Seed size is an essential trait in flowering plants and plays a critical role in adaptation to the environment (Tao et al., 2017). However, these traits are complex quantitative traits regulated by polygenes and strongly influenced by environment and genotype × environment (G × E) interaction, and hence it is more difficult to select for based on phenotype (Yao et al., 2014). All soybean varieties developed in tropical and subtropical countries have small seed size compared to the temperate-region varieties. Besides, seed size, shape, and weight are important seed quality traits with significant influence on seed use (Basra, 1995;Teng et al., 2017;Wu et al., 2018). A positive correlation between seed size/weight and seed yield has been reported in several studies. Seed size/weight revealed a positive association with seed germination capability and vigor, thereby significantly affecting the competitive capability of the seedling for nutrient and water resources and light, hence enhancing stress tolerance (Edwards and Hartwig, 1971;Haig, 2013). Dissecting the genetic factors underlying seed size, shape, and weight and their relationship to the ambient environment is essential for improving soybean yield and quality-related traits. In addition, understanding the additive and additive × environment (AE) effects of quantitative trait loci (QTLs) and their contribution to the phenotypic variations would facilitate the application markerassisted selection (MAS) because it will prominently lead the breeders in the QTL selection and expectation of the outcomes of MAS (Jannink et al., 2009).
A major aim of utilizing linkage mapping in plant breeding is to deepen our understanding of the inheritance and genetic architecture of quantitative traits and detect markers that can be employed as indirect selection tools in plant breeding (Bernardo, 2008;Abou-Elwafa, 2016a). In this regard, QTL mapping has been regularly used for detecting the QTL/gene underlying the quantitative traits such as seed size, shape, and weight in crop plants. As known, parental diversity and marker density greatly influence the accuracy and precision of QTL mapping. Besides, the population size used in most of the previously published reports for genetic mapping studies usually varied from 50 to 250 individuals; however, larger populations are needed for high-resolution mapping. A high-density genetic map facilitates the detection of closely linked markers associated with QTLs and provides an effective base for investigating quantitative traits (Mohan et al., 1997;Galal et al., 2014;Tewodros and Zelalem, 2016). The statistical difference between phenotypic data obtained from various environments could enhance the accuracy to detect QTL position (Zhao and Xu, 2012). Previous studies identified important seed size and shape QTLs, which were also associated with hundred seed weight (HSW); however, most of the studies used low-density genetic maps based on restriction fragment length polymorphism, simple-sequence repeat markers, and biochemical and morphological markers which have a large confidence interval with a low resolution of QTLs that are not suitable for candidate gene identification (Bernardo, 2008;Han et al., 2012;Abou-Elwafa, 2016a,b). Therefore, it is crucial to employ high-density genetic maps to detect more new recombination in a population, which will increase the accuracy of QTL mapping, candidate gene identification, and MAS (Mahmoud et al., 2018;Cao et al., 2019;Hina et al., 2020). Recent advances in genetic and genomic tools and approaches have facilitated the identification of QTLs associated with various agronomic traits in different crop species, including soybean, and the identification of candidate genes underlying these genomic regions (Peterson et al., 2012;Sun et al., 2012;Xie et al., 2018;Zhang F. et al., 2019;Hina et al., 2020;Kajiya-Kanegae et al., 2020). Although epistatic interaction has a stronger effect on inbreeding depression, heterosis, adaptation, speciation, and reproductive isolation (Ma et al., 2015), previous studies focused mostly on identifying main-effect QTLs associated with seed sizes, shapes, and 100seed weight in soybean. To date, at least 441, 52, and 297 QTLs for seed size, shape, and HSW have been reported 1 based on various genetic contexts, advances in marker technology, statistical methods, and multiple environments. However, most of these QTLs are minor (R 2 < 10%), not stable, and with larger genomic regions/confidence intervals (Han et al., 2012;Hu et al., 2013;Kato et al., 2014). Recently, there have been limited studies on detecting QTLs with epistatic effects and their interactions with the environment (QEs) (Panthee et al., 2005;Zhang et al., 2011Zhang et al., , 2018Liang et al., 2016). Knowledge about the molecular mechanisms underlying soybean seed size, shape, and weight is still limited. So far, only two seed sizes/weightrelated genes have been cloned and characterized from the soybean, i.e., the Glyma20g25000 (ln) gene that has a significant impact on seed size and number of seeds per pod (Jeong et al., 2012) and the PP2C-1 gene that enhances seed size/weight (Lu et al., 2017). Therefore, it is essential to identify major and stable QTLs and candidate genes related to seed size, shape, and weight to improve our understanding of genetic mechanisms controlling these important traits in soybean (Kato et al., 2014;Zhang et al., 2018). The present study aimed to (i) map main-effect QTLs (M-QTLs), additive × additive (AA) QTLs, and QE for seed size, shape, and weight traits; (ii) employ two QTL mapping approaches to validate the identified QTLs in two mapping populations across multiple environments; (iii) analyze the epistatic QTL pairs and their interactions with the environment for further utilization of these QTLs in soybean genetic improvement; and (iv) mine potential candidate genes for the major (R 2 > 10%) and stable (identified across multiple environments or populations) QTLs. We hypothesize that the results of this study would provide comprehensive knowledge on the genetic bases for these traits and mined candidate genes would serve as a foundation for functional validation and verification of some genes for seed size, shape, and weight in soybean. Besides, the results would be useful for the application of marker-assisted breeding (MAB) in soybean.

Plant Materials and Experiments
Two recombinant inbred line (RIL) populations, i.e., ZM6 and LM6, comprising 126 and 104 lines, respectively, were used in the present study. The two populations were developed by single seed descent (SSD) from crosses between the genotypes Zhengyang (Z) and Linhefenqingdou (L) as female parents and the M8206 (M6) genotype as the male parent. The two female parents, Z and L, have an average 100-seed weight of 17.1 and 35 g, respectively, whereas the male parent has an average 100-seed weight of 13.7 g.
The two RIL populations along with their parents were evaluated for seed size, shape and HSW across multiple environments. Experiments were conducted in the Jiangpu Experimental Station (33 • 030' N and 63 • 118' E), Nanjing, Jiangsu Province, in the 2012, 2013, 2014, and 2017 growing seasons (designated as 12JP, 13JP, 14JP, and 17JP, respectively), the Fengyang Experimental Station, Chuzhou, Anhui Province (32 • 870' N and 117 • 560' E), in the 2012 growing season (designated as 12FY), and the Yancheng Experimental Station, Yancheng, Jiangsu Province (33 • 410' N and 120 • 200' E), in 2014 (designated as 14YC). Plants were sown in June and harvest was done in October of the same year. Experiments were designed in a randomized complete blocks design (RCBD) with three replications. The experimental plot was one row of 2-m-long at 5-cm plant-to-plant distance and 50-cm row-to-row distance. Planting and post-planting operations were carried out following the recommended agronomic practices.

Phenotypic Evaluation and Statistical Analysis
Eight seed-related traits including seed length (SL), seed width (SW), seed thickness (ST), seed length/width (SLW), seed length/thickness (SLT), seed width/thickness (SWT), flatness index (FI), and 100-seed weight (HSW) were evaluated in LM6 and ZM6 populations under all environments. Phenotypic data were measured and recorded according to standard procedures (Tomooka et al., 2002;Cheng et al., 2006). In brief, seeds harvested from 10 guarded plants in the middle of each row were used for estimating SL, SW, ST, and HSW. The SL was measured as the longest dimension over the seed equivalent to the hilum. SW was measured as the longest dimension across the seed vertical to the hilum. ST was measured as the longest dimension from top to bottom of the seed. The SL, SW, and ST were estimated in millimeters (mm) using the Vernier caliper instrument, according to Kaushik et al. (2007) (Figure 1). Seed shape was identified by calculating three different ratios, i.e., SL/SW (SLW), SL/ST (SLT), and SW/ST (SWT), and FI. The ratios between the SL, SW, and ST were estimated from the individual values of the length, width, and thickness of the seeds according to Omokhafe and Alika (2004), while FI was calculated following the formula elaborated by Cailleux (1945) and Cerdà and Garcıa-Fayos (2002) to describe seed shape: where L is the SL, W is the SW, and T is the ST. It extended from a value of 1 for the round seeds to more than 2 for skinny seeds. The HSW was expressed as an average of five measurements of 100 randomly selected seeds.
Descriptive statistics of the seed size, seed shape, and HSW traits were calculated using the SPSS software, version 24 2 . Analysis of variance (ANOVA) for each environment and the combined overall environments (CE) was performed using the PROC GLM procedure in SAS software based on the random model (SAS Institute Inc. v. 9.02, 2010, Cary, NC, United States). Broad-sense heritability (h 2 ) in individual environments was estimated as: whereas in the CE h 2 was estimated as follows: where σ 2 g , σ 2 e , and σ 2 ge are the variance components estimated from the ANOVA for the genotypic, error, and genotype × experiment variances, respectively, with r as the number of replicates and n as the number of environments. All the parameters were assessed from the expected mean squares in ANOVA. The Pearson correlation coefficient (r) between seed size, seed shape, and HSW traits was calculated from the mean data using the SAS PROC CORR with data obtained for CE (average across environments) for each population.

Construction of Genetic Maps and QTL Analysis
Genetic map information was obtained from the National Center for Soybean Improvement, Nanjing Agricultural University. High-density genetic maps of the ZM6 and LM6 populations comprise 2601 and 2267 bin markers by using the RAD-seq technique, respectively (Supplementary Table 1), which were constructed as previously reported . The total lengths of the ZM6 and LM6 genetic maps were 2630.22 and 2453.79 cM, with an average distance between the markers 1.01 and 1.08 cM, respectively (Supplementary Table 1

Mapping of Main-and Epistatic-Effect QTLs
The WinQTLCart 2.5 software  was employed to identify the M-QTLs using the average values of seed size, seed shape, and 100-seed weight from the individual environments and overall environments with the composite interval mapping model (CIM) (Zeng, 1994). The software running features were 10 cM window size, 1 cM running speed, the logarithm of odds (LOD) (Morton, 1955) threshold which was computed using 1000 permutations because of an experiment-wide error proportion of P < 0.05 (Churchill and Doerge, 1994), and the confidence interval which was determined using a 1-LOD support interval, which was controlled by finding the local on the two sides of a QTL top that is compatible with a reduction of the 1 LOD score. QTLs detected within overlapping intervals in different environments were considered the same (Qi et al., 2017). To identify the genetic effects of the QTLs, i.e., additive QTLs, additive × additive (AA), AE, and AA × environment (AAE), the mixed-modelbased composite interval mapping (MCIM) procedure was employed in the QTLNetwork V2.1 software (Yang et al., 2008). The critical F-value was calculated by a permutation test with 1000 permutations for MCIM. The effects of QTLs were assessed using the Markov Chain Monte Carlo (MCMC) approach. Epistatic effects, candidate interval selection, and putative QTL detection were estimated with an experimentwide error proportion of P < 0.05 (Yang et al., 2007;Xing et al., 2012).

In silico Identification of Candidate Genes
QTLs identified in two or more environments with an R 2 > 10% were considered as stable and major QTLs (Qi et al., 2017). Genomic regions with several M-QTLs related to different studied traits were identified as a QTL cluster.
The Phytozome 3 and SoyBase (see text footnote 1) online platform repositories were employed to retrieve all model genes within the physical interval position of the QTL clusters. Potential candidate genes were identified based on gene annotations (see text footnotes 1 and 3) and the reported putative function of genes implicated in these traits. Gene ontology (GO) enrichment analysis was performed for the identified candidate genes within each QTL cluster region using AgriGO V2.0 4 (Tian et al., 2017). Gene classification was then carried out using the Web Gene Ontology (WeGO) Annotation Plotting tool, Version 2.0 (Ye et al., 2006). The publicly available RNA-Seq database on the SoyBase website was used to analyze the expression of the candidate genes in various soybean tissues and developmental stages. A heat map to visualize the fold-change patterns of these candidate genes was constructed using the TBtools_JRE 1.068 software (Chen C. et al., 2020).

Phenotypic Variations in RIL Populations in Multiple Environments
All measured (SL, ST, SW, and HSW) and calculated (SLW, SLT, SWT, and FI) phenotypic traits exhibited significant differences among the three parental lines across all environments as indicated by the ANOVA (Supplementary Tables 2 and 3). ANOVA revealed that all studied traits were significantly (P < 0. 001 or <0.05) influenced by the environment, genotypes, and the genotype × environment interaction (Supplementary Tables 4, 5), indicating the differential response of the genotypes to the changes in environmental cues. The two populations showed continuous phenotypic variations in all studied traits, implying a polygenic inheritance of these traits (Figure 3). The differences in mean phenotypic values among the three parental lines for seed size, seed shape, and HSW traits were constantly high across all studied environments, and their multi-environment means for both populations (Figure 4). Compared to the male parent, the female parent of the LM6 population, Linhefenqingdou, exhibited an average increase of 27.80, 28.19, 31.10, and 41.37% in SL, ST, SW, and HSW, respectively. Meanwhile, in the ZM6 population the female parent Zhengyang surpassed the male parent M8206 by an average of 11. 00, 9.66, 7.65, and 17.53% in SL, ST, SW, and HSW across all environments, respectively (Figure 4, Supplementary Tables 2, 3). In both populations, several lines overstep their parents in both directions in all studied traits across all environments, suggesting the occurrence of transgressive segregations within the two populations (Figures 3, 4). The broad-sense heritability (h 2 ) under individual environments ranged from 66.75 to 98.08%, 64.39 to 95.72%, and 81.45 to 99.36% for seed size, HSW, and seed shape, respectively (Supplementary Tables 2, 3). Meanwhile, h 2 under combined environments (CE) ranged from 78.25 to 87.31%, 65.70 to 90.80%, and 92.95 to 95.72% for seed size, shape, and HSW, respectively. The correlation coefficient (r 2 ) among SL, ST, and SW exhibited significant positive correlations with each other and with two of the seed shape traits (SLT and SLW) in both populations with r 2 values ranging from 0.79 to 0.91. Meanwhile, SL, ST, and SW exhibited significant negative correlations with the other two seed shape traits (SWT and FI) (Supplementary Table 6). Except for the correlation between SLW and SWT, all the seed shape traits showed significant positive correlations with each other in both populations with r 2 values ranged from 0.33 to 0.95. All seed size traits, i.e., SL, SW, and ST, showed significant positive correlations with HSW with r 2 values ranging from 0.29 to 0.70 in both populations.

Validation of Identified QTLs Employing Two Mapping Approaches
A total of 92, 99, and 48 M-QTLs associated with seed size, seed shape, and HSW, respectively, were mapped by the CIM approach (Supplementary Tables 7-9). Meanwhile, forty-three QTLs were identified for seed size, shape, and HSW by using MCIM approach (Tables 1, 2). Among these, 22 QTLs were identified and validated by both approaches within the same physical chromosomal position, indicating the dependability and stability of these QTLs. A comparison of the physical chromosomal regions of the QTLs detected by both approaches revealed the identification and validation of four QTLs, i.e., qSL-7-1 LM6 , qSW-19-2 LM6 , qFI-3-1 LM6 , and qHSW-3-2 LM6 , for the first time in the two populations (LM6 and ZM6) with an R 2 > 10%. Therefore, we considered these QTLs as novel stable and major QTLs that could be used for map-based cloning, candidate gene identification, and QTL stacking into elite cultivars targeted at improving seed size, shape, and HSW in soybean.

Identification of the Main Effects of the Stable Additive Seed Size QTLs
A total of 92 M-QTLs were mapped for seed size-related traits, i.e., SL, SW, and ST, on all soybean chromosomes, except chromosomes 1 and 12, with LOD scores and R 2 values ranging from 2.5 to 10.3 and 5.0 to 19.7%, respectively, in the two populations (Supplementary Table 7 and Supplementary  Figures 1a-c). Out of these, 30 M-QTLs for SL, 35 for SW, and 27 for ST with alleles underlying QTLs inherited from either of the parents. Seventy-two M-QTLs were mapped in one environment while the remaining 20 were mapped within overlapping regions in at least one environment with or without CE. Forty-seven QTLs exhibiting R 2 > 10% were considered as major QTLs. The most prominent QTL was the qSW-17-2 LM6 (LOD = 6.70-10.29, and R 2 = 16.60-18.30%), which was detected within the physical position 6,844,412-9,645,325 bp in 14JP and CE (Figure 5b). Likewise, the qSL-10-2 ZM6,LM6 (LOD = 6.08-6.89, and R 2 = 15.4-17.1% in ZM6 (17JP) and LM6 FIGURE 3 | Boxplot for seed size, seed shape, and 100-seed weight traits. The black line in the middle of the box shows the median, the white box indicates the range from the lower quartile to the upper quartile, and the dashed black line and yellow dots represent the dispersion and frequency distribution of the phenotypic data in each of the six environments, i.e., 12FY, 12JP, 13JP, 14JP, 14YC, and 17JP, while (A,B) represent LM6 and ZM6 populations.
(14JP) populations) was located to the physical position between 41,454,163 and 43,944,243 bp.

Analysis of Additive-Effect QTLs and Additive × Environment QTL Interactions
The mixed-MCIM approach implemented in the QTL Network V2.1 software for both RIL populations across multiple environments identified 35 AA QTLs on 17 chromosomes related to seven seed size and seed shape traits. These comprise 9, 3, 7, 3, 4, 1, and 8 A QTLs associated with SL, SW, ST, SLW, SLT, SWT, and FI, respectively, in the LM6 and ZM6 populations across all environments ( Table 1). The contributed allele of 11 QTLs of them which reduces seed size and seed shape values through significant additive effects is inherited from the M8206 parent. Meanwhile, the contributed allele of the remaining 24 QTLs, which enhances seed size and shape values through significant additive effects, is inherited from either Zhengyang or Linhefenqingdou parent of the ZM6 or LM6 population, respectively ( Table 1). Thirteen out of 35 QTLs revealed significant AE effects in at least one environment. However, five QTLs, i.e., qSW-13-5 LM6 , qSW-19-2 LM6 , qFI-8-6 LM6 , qSL-10-2 ZM6 , and qST-13-5 ZM6 , showed significant or highly significant AE among all studied environments ( Table 1).

Analysis of Epistatic-Effect QTLs and Epistatic × Environment QTL Interactions
Analysis of the seed size and shape trait data under all environments identified 38 pairwise epistatic effects (AA) QTLs, from which 2, 13, 6, 2, 3, 5, and 7 pairs were related to SL, SW, ST, SLW, SLT, SWT, and FI traits, respectively, with R 2 values ranging 0.51-11.35% (Table 3). All QTL pairs displayed a high significant AA effect. Further analyses revealed that 20 AA QTLs showed significant or highly significant pairwise additiveadditive-environment (AAE) interaction effects in at least one environment with R 2 values ranging from 0.13 to 5.31% (Table 3).

Candidate Gene Mining of the Main-Effect QTLs
The 24 M-QTL clusters were filtered based on the richness in QTLs associated with all or some of the seed size, shape, and HSW traits. As a result, seven QTL clusters, i.e., cluster-03, 04.  (Tian et al., 2017) were used to classify the model genes in each cluster. The classification was based on molecular function, biological process, and cellular components visualized on the Web-based GO (WeGO) V2.0 https://wego.genomics.cn (Ye et al., 2006). In all seven QTL clusters, high percentages of genes were related to catalytic activity, cell part, cell, cellular process, binding, and metabolic process terms, besides the response to stimulus in cluster-03 (Figure 6). These indicate the essential roles of these terms in the seed size, shape, and seed weight development in soybean. Probable candidate genes underlying these QTL clusters responsible for seed size, shape, and HSW in soybean were further predicted based on gene annotations, GO enrichment    Table 12). These genes may function directly or indirectly in regulating seed development in soybean, which regulates seed size, shape, and HSW. These genes are involved in response to brassinosteroid stimulus, regulation of cell proliferation and differentiation, regulation of transcription, secondary metabolism and signaling, storage of proteins and lipids, hormone-mediated signaling pathway, regulation of the cell cycle process, transport, ubiquitin-dependent protein catabolic process, embryonic pattern specification, and response to auxin stimulus (Table 5). However, the RNS-seq data of genes in the soybean genome (Severin et al., 2010) that is publicly available on SoyBase was used to heatmap the expression of those candidate genes in the young leaf, flower, pod, seed, root, and nodule (Figure 7 and Supplementary Table 13). From the heatmaps, 47 genes out of the identified 143 candidate genes are highly expressed during seed developmental stages and in seed-related tissues (Figure 7 and Supplementary Table 13); hence, they could be potential seed size, shape, and HSW regulatory genes.

DISCUSSION
The present study has implemented high-density genetic maps constructed from two-related RIL populations LM6 and ZM6 comprising 2,267 and 2,601 bin markers, respectively , to validate QTLs associated with seed size, shape, and weight. To minimize the environmental errors, the two RIL populations were evaluated in four environments. The transgressive segregation and continuous variations observed in the two populations in all studied phenotypic traits facilitate the identification of a high number of both major and minor effect QTLs including some novel QTLs associated with all studied traits (Teng et al., 2009;Xu et al., 2011;Zhang et al., 2018). All measured and calculated traits in both populations were significantly (P < 0.01) influenced by genotype (G), environment (E), and their interactions (G × E), suggesting that the seed size, shape, and weight traits are not only governed by both genetic and environment; however, there was an effect of the G × E interaction as well (Sun et al., 2012;Hu et al., 2013;Liang et al., 2016). This explains the observed high h 2 (99.04%) and accordingly deduces that these traits are amenable to manipulation by selection without the help of molecular markers. Except for SL, SW, and ST that exhibited a highly significant correlation between each other and with HSW, our data showed that seed size, shape, and weight traits are not correlated, which is favorable when breeding for a round type with smaller or bigger seed size (Cober et al., 1997;Salas et al., 2006). For validation of identified QTLs, a comparative QTL analysis using the CIM QTL mapping approach with the SoyBase database identified 69, 82, and 29 novel QTLs for seed size, shape, and HSW, respectively, indicating the distinct genetic architecture of the LM6 and ZM6 populations. These novel QTLs together explain over 88.00% of phenotypic variance for seed size, shape, and weight, signifying their potential value for improving soybean cultivars. Besides, the identification of novel QTLs in the present study suggests that more germplasms are required for unraveling the complex genetic basis for seed size and shape traits in soybean. Among these novel QTLs, eight novel major QTLs associated with HSW where their physical intervals did not overlap with any of the previously reported HSW QTLs, suggesting them as potential loci for HSW and major QTLs for future fine mapping to delimit the physical interval. Numerous QTLs associated with SW, HSW, SLW, and SLT identified in this study are co-localized with previously reported corresponding QTLs (Salas et al., 2006;Li et al., 2010;Moongkanna et al., 2011;Hu et al., 2013;Jun et al., 2014;Fang et al., 2017;Hacisalihoglu et al., 2018;Hina et al., 2020). Our study identified for the first time 13 major QTLs (R 2 > 10%) related to FI; thus, we considered them as novel QTLs. Besides, Chr01 and Chr03 harbored four and three FI QTLs, suggesting crucial roles of Chr 01 and 03 in controlling the inheritance of seed FI in soybean. The positive alleles for seed size, shape, and HSW traits were inherited from both parents of the two RIL populations. Therefore, it is likely that not only the higher seed size and heavyweight parent (Linhefenqingdou or Zhengyang) contributed favorable alleles but also the lighter seed weight parent (M8206) might play a role Hina et al., 2020).
Mapping of QTLs associated with seed size, shape, and weight-related traits using the MCIM approach was performed to (i) dissect the additive effect QTLs and Q × E interactions, which is essential for selecting the most compatible varieties adapted to particular environments, and (ii) further validate the QTLs identified by the CIM approach. The MCIM approach identified 18 QTLs for seed sizes, shapes, and weight traits that are co-localized in the same physical interval of the CIM-mapped QTLs. Therefore, these QTLs could also be stable QTLs for further fine mapping and map-based cloning to uncover the genetic control and mechanisms of seed size, shape, and weight traits in soybean, and molecular markers tightly linked to these QTLs could be used for MAS.
Dissecting the epistatic and QTL × environment effects are crucial for understanding the genetic mechanisms that contributed to the phenotypic variations of complex traits (Kaushik et al., 2007). Disregarding intergenic interactions will lead to the overestimation of individual QTL effects, and the underestimation of genetic variance resulting in a large drop in the genetic response to MAS especially in late generations (Nyquist and Baker, 1991;Zhang et al., 2004). The identified 40 pairwise digenic epistatic QTLs for seed size, shape, and weightrelated traits in the present study could be considered as modifying genes that do not exhibit only additive effects but could affect the expression of seed size, shape, and weightrelated genes through epistatic interactions. Similar results for the epistatic interaction of seed size, shape, and weight QTLs have been also previously reported by Xin et al. (2016) and Zhang et al. (2018). The appearance of epistatic interactions for a specific trait makes selection difficult. Noteworthily, all main-effect QTLs detected in our study had no epistatic effect, which raises the heritability of the trait guiding to easier selection.
Genomic regions were identified as QTL clusters based on the presence of several QTLs related to all or some of the seed size, shape, and HSW traits. Accordingly, 24 QTL clusters were identified on 17 chromosomes each containing three or more QTLs related to seed size, shape, and HSW traits. These QTL clusters have not been previously reported, which enhances the developing knowledge of the genetic control of these traits. The co-localization of QTLs for seed size, shape, and HSW and how they have exceptionally corresponded support the highly significant correlation with each other (Cai and Morishima, 2002) (Supplementary Table 10). Besides, the occurrence of the QTL clustering could signify a linkage of QTLs/genes or outcome from the multiple effects of one QTL in the same genomic region Cao et al., 2017;Liu et al., 2017). The QTL clusters reveal that the QTL linkage/gathering could make the enhancement of seed size and shape easier than single QTLs (Hina et al., 2020). Significant positive correlations of soybean seed protein and oil contents and seed yield with seed size and seed shape have been shown; therefore, these traits are directly associated with seed size and shape in soybean (Qi et al., 2011;Hacisalihoglu and Settles, 2017;Wu et al., 2018). This notion would explain the co-localization of QTLs associated with seed protein and oil contents in the genomic regions of several QTL (Panthee et al., 2005;Salas et al., 2006;Vieira et al., 2006;Moongkanna et al., 2011;Yang et al., 2011). The position of the first flower and the number of days to flowering have large effects on seed number per plant in soybean (Tasma et al., 2001;Yamanaka et al., 2001;Khan et al., 2008), which affects seed size and HSW indicating the existence of common genetic factors for these traits. QTLs associated with the position of the first flower identified previously (Tasma et al., 2001;Han et al., 2012) are located to the genomic region of clusters 16.1, 19.2, and 20 (Hyten et al., 2004). The extensive analysis of QTL clusters in our study suggests that breeding programs aiming to improve seed size, shape, and weight with enhanced quality should focus on QTL clustering and select QTLs within these regions. Besides, the existence of QTL clusters provides evidence that some traits-related genes are more densely concentrated in specific genomic regions of crop genomes than others (Fang et al., 2017).
Identification of candidate genes underlying QTL regions is of great interest for breeding programs (Abou-Elwafa, 2018; Abou-Elwafa and Shehzad, 2018). A bioinformatics pipeline implementing genomic sequences of identified QTL clusters was employed to identify candidate genes. The pipeline comprises three complementary steps, i.e., (1) retrieving candidate genes from the SoyBase database, (2) visualizing the molecular function of candidate genes by GO enrichment analyses and gene classification, and (3) implying candidate genes in seed size, shape, and weight based on their expression profiles. Accordingly, 47 genes were considered as potential candidates. Most of the identified candidate genes are related to the terms of catalytic activity, cell part, cell, cellular process, and binding and metabolic process as indicated by GO enrichment and gene classification analyses. These terms have functions related/involved in seed development, which influence the size, shape, and weight of seeds (Mao et al., 2010;Li and Li, 2014). For example, the Glyma07g14460 gene underlying QTL cluster-7 belongs to the oxygenase (CYP51G1) protein class, which has been confirmed to regulate seed size in soybean (Zhao et al., 2016). Furthermore, 10 candidate genes were identified as a regulator of ubiquitin-dependent protein catabolic process, RING-type E3 ubiquitin ligases, and lipid catabolic process ( Table 5). Several components of the ubiquitin pathway such as the ubiquitin-activating enzyme (E1), ubiquitin-conjugating enzyme (E2), and ubiquitin protein ligase (E3) have been reported to play important roles in regulation seed and organ size (Li and Li, 2014). Similarly, 16 candidate genes have functions in pollen tube development, embryo sac egg cell differentiation, post-embryonic development, regulation of seed maturation, positive regulation of gene expression, regulation of cell cycle process, ovule development, anther development, seed dormancy process, and seed maturation ( Table 5), and hence they are likely to participate in regulating seed size, shape, and weight in plants, including soybean (Meng et al., 2016). Ten candidate genes are involved in response to auxin stimulus, response to ethylene stimulus, and abscisic acid biosynthetic process which are known to be implicated in promoting seed size and weight in Arabidopsis ( Table 5) (Xie et al., 2014). Six genes play functions in the glucose catabolic process, phosphatidylcholine biosynthetic process, carbohydrate metabolic process, maltose metabolic process, and starch biosynthetic process and are implicated in the partitioning and translocation of photoassimilates and grain filling in rice ( Table 5) Zhang et al., 2020).

CONCLUSION
QTLs associated with seed size, shape, and weight in soybean were identified and validated using two mapping approaches in two populations across multiple environments. This is the first comprehensive investigation of the identification and validation of QTLs for the FI as a seed shape trait in soybean. Employing a bioinformatics pipeline identified candidate genes behind genomic regions harboring major and stable QTL clusters underlying the inheritance of seed size, shape, and weight. The implemented bioinformatics pipeline delimits the number of the identified candidate genes to 47-gene genomic regions involved directly or indirectly in seed size, shape, and weight. These genes are highly expressed in seed-related tissues and nodules, indicating that they may be involved in regulating these traits in soybean. Furthermore, some of the potential 47 candidate genes have been included in our ongoing projects for functional validation to confirm their effect on seed size, shape, and weight. Our study provides detailed information for genetic bases of the studied traits and candidate genes that could be efficiently implemented by soybean breeders for fine mapping and gene cloning and for MAS targeted at improving seed size, shape, and weight.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
TZ designed the project. ME performed the experiments and drafted the manuscript. ME, BK, SS, SL, YC, MA, and AH analyzed the data. TZ and SA-E revised the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.
Supplementary Table 1 | Distribution of SNPs, recombination bins and markers mapped on soybean chromosomes/linkage groups. Bin-map (RAD-sequencing).
Supplementary Table 2 | Descriptive statistics, variance components, and broad-sense heritability (h 2 ) of seed shape and size traits evaluated in two recombinant inbred lines (RILs) LM6 and ZM6. The RILs and their parents were grown under different environments. * and * * represent significance at 5 and 1%, respectively.
Supplementary Table 3 | Descriptive statistics, variance components, and broad-sense heritability (h 2 ) for 100-Seed weight trait evaluated in two recombinant inbred lines (RILs) LM6 and ZM6. The RILs and their parents were grown under different environments. * and * * represent significance at 5 and 1%, respectively.