Identification of Additive–Epistatic QTLs Conferring Seed Traits in Soybean Using Recombinant Inbred Lines

Seed weight and shape are important agronomic traits that affect soybean quality and yield. In the present study, we used image analysis software to evaluate 100-seed weight and seed shape traits (length, width, perimeter, projection area, length/width, and weight/projection area) of 155 novel recombinant inbred soybean lines (NJRISX) generated by crossing “Su88-M21” and “XYXHD”. We examined quantitative trait loci (QTLs) associated with the six traits (except seed weight per projection area), and identified 42 additive QTLs (5–8 QTLs per trait) accounting for 24.9–37.5% of the phenotypic variation (PV). Meanwhile, 2–4 epistatic QTL pairs per trait out of a total of 18 accounted for 2.5–7.2% of the PV; and unmapped minor QTLs accounted for the remaining 35.0–56.7% of the PV. A total of 28 additive and 11 epistatic QTL pairs were concentrated in nine joint QTL segments (JQSs), indicating that QTLs associated with seed weight and shape are closely related and interacted. An interaction was also detected between additive and epistatic QTL pairs and environment, which made significant contributions of 1.4–9.5% and 0.4–0.8% to the PV, respectively. We annotated 18 candidate genes in the nine JQSs, which were important for interpreting the close relationships among the six traits. These findings indicate that examining the interactions between closely related traits rather than only analyzing individual trait provides more useful insight into the genetic system of the interrelated traits for which there has been limited QTL information.


INTRODUCTION
Soybean (Glycine max [L.] Merr.) is widely cultivated and consumed, accounting for 70% of protein meal and 28% of vegetable oil sources globally in 2018 (SoyStat, 2019). As yield is the most important target trait for improvement of this crop, its closely related trait -seed size including seed weight and shape (volume) -have been widely investigated (Fang et al., 2017;Yu et al., 2017). Seed size, which is measured as 100-seed weight (100-SW), is a fitness trait that is critical for adaptation to a particular environment (Tao et al., 2017). Seed shape (volume) traits have also been the focus of improvement by farmers and soybean breeders (Garg et al., 2017); round seed varieties are more suitable for sowing and mechanical handling, and large-seed soybeans are more attractive for direct consumption (e.g., as edamame) while small-seed varieties are consumed as sprouts. Thus, 100-SW and seed shape not only affect production and processing, but also influence growers' preferences in the propagation of a cultivar (Maughan et al., 1996).
While 100-SW of soybean can be easily measured, seed shape has not been well studied because the traits are ill-defined and their measurements are tedious and inaccurate. Imaging technology has recently been applied to the study of crop phenotypes (Price et al., 2011). For instance, a computer imagebased software has been developed that can accurately measure soybean seed morphology traits including seed length (SL), seed width (SW), seed perimeter (SP), and seed projection area (SA) (Ding et al., 2019), which can be used to calculate seed length-towidth ratio (SLW) and seed weight per projection area (SWA). This procedure is simple, accurate, and has high throughput compared to manual measurements using Vernier calipers.
Natural selection of larger seeds in soybean has resulted in an accumulation of minor QTLs (Liu et al., 2007), and QTL mapping has provided insight into these evolutionary changes (Salas et al., 2006;Xu et al., 2011;Hina et al., 2020). Early, composite interval mapping (CIM) using Windows QTL Cartographer  or inclusive composite interval mapping (ICIM) using QTL IciMapping (Li et al., 2007) were applied to QTL mapping. However, the CIM and ICIM methods can only detect additive QTLs and additive-epistatic QTLs, while providing no QTL × environment information. In contrast, the mixedmodel-based composite interval mapping (MCIM) of QTL Network can detect additive, epistatic, and QTL × environment interactions, thus providing more detailed QTL information (Yang et al., 2008).
Linkage mapping has been widely used in detecting quantitative trait loci (QTL) of soybean 100-SW (Stombaugh et al., 2004;Kato et al., 2014;Yang et al., 2019). To date, 304 QTLs have been identified for 100-SW in soybean; most are minor QTLs and the candidate genes have yet to be validated (Karikari et al., 2019). There are 29 QTLs for SL mapped to 13 chromosomes, 25 for SW on 13 chromosomes, and 18 for SLW on 11 chromosomes in SoyBase 1 .
Besides, genome-wide association studies (GWAS) has been also widely used to analyze soybean 100-SW and seed traits for natural populations (Zhang et al., 2016b;Li et al., 2019;Zhao et al., 2019;Qi et al., 2020). For example, a new seed size locus SW9-1 was found through the GWAS study for 100-SW, SL and SW . GWAS is powerful in detecting additive QTLs both for natural population and bi-parental population, but if epistatic QTLs are involved, new GWAS procedure is not available yet. Therefore, in the present study, the linkage mapping procedure MCIM of QTL Network will be considered for a recombinant inbred line (RIL) population.
Given the lower marker density in early genetic linkage maps, chromosome regions potentially harboring 100-SW and seed shape QTLs were too broad and often overlooked, leading to imprecise mapping that was not useful for identifying candidate genes for marker-assisted breeding. Recent advances in sequencing technology have led to the discovery of new markers and the generation of high-density molecular genetic linkage maps to detect 100-SW QTLs for annotated candidate genes in soybean (Karikari et al., 2019). Thus, a high-density molecular genetic map is a basic requirement for QTL fine mapping and candidate gene discovery (Zhang et al., 2016a;Wei et al., 2019). 100-seed weight and seed shape are heritable traits conferred by both major and minor genes . For example, the GmGA20OX gene affecting SL, SW, and 100-SW encodes an enzyme involved in gibberellin synthesis (Lu et al., 2017), while the Arabidopsis homolog of GmCYP78A10, regulates SL, SW, seed thickness, and seed weight in soybean (Wang et al., 2015). The SoyWRKY15a gene associated with soybean seed volume and weight was identified through a combination of RNA sequencing and QTL mapping (Gu et al., 2017). However, the up-/downstream relationships of these genes and the mechanisms by which they regulate seed traits during plant development are unclear. The aim of the present study was to identify a QTL system for seed weight and morphology including additive, epistatic, and QTL × environment interactions of 100-SW, SL, SW, SP, and SA as well as SLW and SWA using an enhanced high-density genetic linkage map for a population of the RIL NJRISX. From the QTL system, candidate genes were annotated and validated using published RNA expression datasets. As seed weight and shape traits are interrelated, we speculated that their genetic constitutions may be somewhat overlapping, therefore, the genetic relationships among these traits were also examined.

Plant Materials and Field Experiments
A population of the RIL NJRISX was established from 155 F 2derived homozygous lines obtained by crossing "Su 88-M21" and "XYXHD". The large-sized round seed of "Su 88-M21", smallsized oval seed of "XYXHD" and the RIL population (NJRISX) derived from the cross were provided by the National Soybean Improvement Center of Nanjing Agricultural University. RILs along with the parents were tested in three environments: Jiangpu Experimental Station of Nanjing Agricultural University, Nanjing, Jiangsu Province (latitude 33 • 03 N; longitude 118 • 63 E) in June 2017 (17JP); Wanjiang Station of Nanjing Agricultural University, Dangtu County, Anhui Province (latitude 32 • 87 N; longitude 117 • 56 E) in June 2017 (17DT); and the same location in June 2019 (19DT). Each line was planted in a single row plot (length × width, 1 × 0.5 m) in a randomized complete block design with three replicates. Uniform agronomic practices were used in the experiments.

Measurement of 100-Seed Weight and Seed Shape Traits
After harvesting, seeds were dried to a uniform moisture. Diseased, insect-infested, and physically damaged seeds were removed. Seed weight was measured using an electronic balance with 0.001-g accuracy. SL, SW, SP, and SA were measured from images using a high-speed camera (Model eloam-S1500A2; Shenzhen E-Loam Technology Co., Shenzhen, China). About 120 seeds per plot were collected and distributed evenly in the middle of the light-emitting diode backlight board of the equipment to obtain clear and complete images of the seeds. The board was calibrated and adjusted to an appropriate brightness according to the indoor light level. The equipment parameters were as follows: brightness, 64 cd/m 2 ; contrast, 15; hue, 0; saturation, 43; clarity, 100; gamma, 100; white, balance 4600 auto, backlight contrast 0, power line frequency (anti-flicker) 50 Hz, focus 65, exposure −6; and scan size, 640 × 480 mm. The acquired images were processed using previously developed software (Ding et al., 2019). The image background was removed based on the "Otsu" threshold method to obtain the binary image of soybean seeds, and the adhesive soybean seeds in the binary map were segmented and counted based on the watershed transformation method. The sum of white pixels in each connected domain and the correction formula based on the freeman chain code algorithm is used to calculate seed area and perimeter, respectively. The second-order statistical moment is used to obtain the main axis direction of the soybean seeds and then the SL and SW was calculated by the extreme difference of the boundary point. Besides, this software can process hundreds of photos at a time and export the data to EXCEL. Data on the exact number of seeds on the backlight board, SL, SW, SP, and SA were directly obtained using the computer software and 100-SW was converted from seed number and seed weight values; SLW = SL/SW and SWA = 100-SW/SA were calculated. All seven seed weight and shape traits were measured in three replicates under three environmental conditions. Before measuring the experimental seeds, in order to confirm that images obtained of a seed lot from a given plot were reproducible, two seed samples of the 138 lines were obtained from a single replicate and SL, SW, SP, and SA were measured under the same conditions using the same instrument. The correlation coefficients between the two measured values of SL, SW, SP, and SA were 0.95, 0.92, 0.93, and 0.94, respectively, indicating good consistency between the measurements from the same seed lot (Supplementary Table S1) and validating the utility of the imaging procedure used in this study.

Statistical Analysis of Phenotypic Data
Data were analyzed using Excel 2016 software (Microsoft, Redmond, WA, United States). Analysis of variance (ANOVA) and correlation analysis were performed using SAS v9.4 software (SAS Institute, Cary, NC, United States). Heritability (h 2 ) was calculated as where σ 2 g , σ 2 gy , and σ 2 e are genotype, genotype × environment interaction, and error variance estimated from the expected mean squares in ANOVA, respectively; n is the number of environments; and r is the number of replicates. The genotypic coefficient of variation (GCV) was calculated as GCV = σ g /µ, where µ is the mean value of the RIL population.

Specific-Locus Amplified Fragment Sequencing (SLAF-seq) and Genetic Linkage Map Construction
In 2015, two parents and 155 progeny soybean lines were planted at Jiangpu Experiment Station. Genomic DNA was extracted from young leaves and SLAF-seq (Sun et al., 2013;Zhang et al., 2016a) was performed by the Biomarker Technologies Corporation (Beijing, China). Briefly, the soybean genome sequence (G. max, Wm82.a1. v1) was used as a reference to predict digestion sites. RsaI and HaeIII were used to digest the genomic DNA and the obtained fragments (SLAF tags of 364-414 bp) were processed for target selection. After passing library quality inspection, sequencing was performed with the HiSeq 2500 system (Illumina, San Diego, CA, United States). Rice (Oryza sativa) 2 was processed in the same manner and served as the control for library construction and sequencing to determine whether the enzymes used in this experiment were activated or there were other quality problems in library construction. A total of 281.93 M reads were generated for the two parents and 155 recombinant inbred lines. After discarding low-quality reads, 247,821 SLAF tags were screened; of these, 207,180 and 212,958 were identified from female parent (Su 88-M21) and male parent (XYXHD), respectively, with sequencing depths of 41.56 and 44.04 fold, respectively. There were 150,515 SLAFs in the RILs with 7.75-fold coverage on average, corresponding to 1,473,406 reads. The reads of each identified sample were analyzed by clustering for SLAF tag screening. After removing low-quality reads, 52,988 tags that matched the parental separation pattern (Sun et al., 2013) were identified as polymorphic in the whole RIL population; these were screened according to SLAF tag filtering rules (Sun et al., 2013), yielding 9625 markers for linkage analysis.
Markers with high collinearity were removed for map construction, leaving 5351 SLAF markers. To obtain highquality molecular tags, modified logarithm of odds scores between the tags were calculated and used for linkage grouping. HighMap software  was used to construct a genetic map for each linkage group. The software uses an efficient maximum likelihood estimation method to correct label classification based on the layout results, and after multiple cycles of layout correction-layout, a high-quality map is obtained. Map quality was evaluated in terms of co-linearity, genetic relationships, and monomer sources. The linkage map (Supplementary Figure S1) was drawn using the R-based software LinkageMapView (Ouellette et al., 2018).

Mapping QTLs Conferring 100-Seed Weight and Seed Shape Traits
The MCIM function of QTL Network v2.0 was used to detect additive QTLs, additive × additive epistatic QTL pairs, and additive QTL × environment and epistatic QTL pair × environment interactions. The critical F value of MCIM was calculated with 1,000 permutation tests. The QTL effects were estimated by using the Monte Carlo Markov Chain method with 20,000 Gibbs sampler iterations and candidate interval selection; putative QTL detection and QTL effects were calculated with an experiment-wise type I error under a = 0.05 (Yang et al., 2007;Xing et al., 2012). The CIM method of Windows QTL Cartographer v2.5 software was used to perform CIM scanning of chromosomes to identify additive QTLs of soybean seed traits in different environments and verify the QTL mapping results of QTL Network v2.0 software. For CIM, the LOD significance threshold determined empirically using 1,000 permutation tests. Neighboring QTLs of different seed traits within the same support interval were grouped into a joint QTL segment (JQS). The 1-logarithm of the odds support (confidence) intervals were calculated using the QTL network procedure (Yang et al., 2007), which is defined by points on the genetic map at which the likelihood ratio has fallen from the maximum by a factor of 10 (Lander and Botstein, 1989).
Positive and negative additive effects indicated that the alleles were from "Su 88-M21" and "XYXHD" for all seed traits, respectively. Genetic contribution of the collective unmapped minor QTLs is equal to total genetic contribution minus variation explained by all detected additive and epistatic QTLs (Korir et al., 2011). Random error variation is equal to total phenotypic variation minus total genetic contribution, and variation explained by all detected additive QTL × environment and epistatic QTL× environment interactions (Xing et al., 2012).

Annotation of Candidate Genes Conferring 100-Seed Weight and Seed Shape Traits
Candidate gene annotation was carried out for JQSs based on physical locations in SoyBase 3 . Gene ontology (GO) annotation v1.1 was downloaded from SoyBase 4 and GO classification was based on clusterProfiler package  in R software (p value < 0.01, q value < 0.05) to identify terms related to seed weight and shape traits in soybean. To determine whether these genes are expressed in seeds, gene expression data of 14 soybean tissues at seven seed development stages (seed_10DAF, seed_14DAF, seed_21DAF, seed_25DAF, seed_28DAF, seed_35DAF, and seed_42DAF, where DAF stands for days after flowering) and 7 other tissues (young_leaf, flower, one.cm.pod, pod.shell.10DAF, pod.shell.14DAF, root, and nodule) (Severin et al., 2010) were downloaded from SoyBase 3 . Expression data from cultivated soybean were used as an approximation for analyzing the parents in the present study. The candidate genes were classified according to Protein Class using Protein ANalysis THrough Evolutionary Relationships (PANTHER) 5 and annotated based on the National Center for Biotechnology Information (NCBI) 6 and UniProt Protein 7 databases to determine gene function.

Phenotypic Variations (PVs) in 100-Seed Weight and Seed Shape Traits
The seed of the maternal parent "Su 88-M21" is round and large, while that of the paternal parent "XYXHD" is flat and small ( Figure 1H); as these seeds differ significantly in terms of 100-SW, SL, SW, SP, SA, SLW, and SWA, it was possible to establish a RIL population with potential for genetic variation in these seven seed weight and shape traits (Table 1). Accordingly, the frequency distributions all showed a large variation (Figure 1 and Table 1), while transgressive segregation was observed for SL, SP, and SA in both directions (Figures 1B,D,E). In the joint ANOVA for multiple environments, there existed significant differences among Lines and Line × Env. interactions for all the traits except no significant differences among lines in SWA ( Table 2). The phenotypic data from multiple environments were used to estimate heritability. The heritability of the seven traits ranged from 60.3 to 88.2% (Table 1). These results showed that further QTL constitution analysis for the traits except SWA would be meaningful.
Of the seven seed traits, 100-SW, SL, SW, SP, and SA are of the first order and can be directly measured, while SLW and SWA are second-order traits that are calculated from first-order traits. To clarify their relationships, a correlation analysis was performed for these traits. The correlation coefficients of the five first-order traits ranged from 0.78 to 1.00, showing that they are closely related (Table 3). Notably, the correlation coefficient between SP and SA was approximately 1.00; additionally, the high correlation between 100-SW and the other four first-order traits (0.88-0.94) implied that they have a common genetic basis. For the two second-order traits, SLW -which is related to seed shape -was not correlated with 100-SW or SW, while SWArelated to seed volume weight -was not correlated with SLW, their other correlations with the other first rank traits were not high, therefore, are not attractive traits. In this case, the second-order trait SWA was neglected and excluded in the QTL mapping analysis.

Genetic Linkage Map and Genetics of 100-Seed Weight and Seed Shape Traits
A molecular genetic linkage map with 5351 SLAF-seq markers was constructed that spanned 3046.52 cM with an average intermarker interval of 0.57 cM. Chr9 harbored the most markers at 500 and Chr11 had the fewest at 80; the latter spanned the shortest distance at 106.38 cM, whereas markers on Chr9 covered the largest distance at 199.24 cM (Supplementary Figure S1 and Supplementary Table S2). By comparing the physical positions of markers on each chromosome, a consistent relationship between physical and genetic distances was observed on all chromosomes except Chr11 and Chr17, indicating that the genetic map was of good quality (Supplementary Figure S2).
35.0-56.7%) was not explained by these QTLs, and were instead attributed to a collection of undetected minor QTLs (Table 4). In addition to these QTLs, the PV was explained by additive QTL × environment interaction (1.4-9.5%), epistatic QTL × environment interaction (0.4-0.8%), and random error (2.4-14.0%). Thus, the largest portion of the genetic variation was explained by additive QTLs, with epistatic QTL pairs accounting for only a small part of the genetic variation in the six seed traits. Identification of the unmapped minor QTLs, which collectively accounted for a relatively large portion of the The decimal values in the lower left corner represent the correlation between traits, and the integer values in the upper right corner represent the number of joint QTL segments shared between traits. Pearson correlation coefficients were calculated from the average of three environments. **Significant at a 0.01 probability level. 100-SW, 100-seed weight; SL, seed length; SW, seed width; SP, seed perimeter; SA, seed projection area; SLW, ratio of seed length-to-width; SWA, seed weight per projection area. a,b Additive and epistatic QTLs: variation explained by all additive and epistatic QTLs for this trait. Numbers in the first pair of parentheses in the "Additive QTL" and "Epistatic QTL" columns are the contributions of the QTL to total genetic variation, while those in second pair of parentheses are the numbers of QTL and QTL pairs, respectively. c Minor QTL: (genetic contribution of the collective unmapped minor QTLs) = (total genetic contribution) -(variation explained by all detected additive and epistatic QTLs) (Korir et al., 2011). The numbers in parentheses are the contributions of minor QTL to the total genetic variation. d Total genetic contribution. e Additive QTL × environment: variation explained by all detected additive QTL × environment interactions. f Epistatic QTL × environment: variation explained by all detected epistatic QTL × environment interactions. g Random error = (total phenotypic variation)−(total genetic contribution)−(variation explained by all detected additive QTL × environment and epistatic QTL × environment interactions) (Xing et al., 2012). h Total phenotypic variation.
genetic variation, depends on improvements in the precision and sensitivity of mapping procedures. In addition, there were both additive and epistatic QTL × environment interactions, but these explained only a small part of the PV. Overall, the genetic and gene × environment components of the six traits were similar. Table 5 shows information on each of the additive QTLs of the six seed traits. A total of 42 QTLs distributed on 13 chromosomes were identified by MCIM of the QTL network, of which 25 were also identified by CIM using Windows QTL Cartographer. For 100-SW, eight QTLs were identified on six chromosomes, each accounting for 1.8-8.2% of the PV (Figure 2 and Table 5). Two QTLs, q100SW-6-1 and q100SW-19-1 interacted with the environment and contributed 3.9 and 2.5% PV, respectively. Three QTLs have been previously reported in the literature (Supplementary Table S3). Seven QTLs on seven chromosomes were identified for SL, accounting for 3.3-6.8% of the PV; two have been previously reported. For SW, five QTLs on five chromosomes were identified, accounting for 2.3-8.4% of the PV; one QTL has been previously reported. For SP, eight QTLs on eight chromosomes were identified, accounting for 2.1-7.8% of the PV. Eight QTLs on eight chromosomes were identified for SA, accounting for 2.2-8.4% of the PV. For SLW, six QTLs on six chromosomes were identified, each accounting for 3.6-10.1% of the PV; two have been reported in the literature. The major QTL qSLW-2-1 was located at 52.2-54.9 cM on Chr2, accounting for 10.1% of the PV, and was detected by Cartographer under all three environmental conditions. These seed trait additive QTLs interacted with the environment and their phenotypic contribution ranged from 0 to 3.9%; this is not a large proportion, it indicated that the identified QTLs were relatively stable. Some of the identified additive QTLs interacted with each other to form significant epistatic QTL pairs involving all traits. There were three epistatic QTL pairs for 100-SW, two for SL, two for SW, four for SP, four for SA and three for SLW, with contributions to PV ranging from 0.6 to 3.0% for a single epistatic QTL pair (Figure 2 and Table 6) and 2.5-7.2% for a single trait. These low rates indicated that there was epistasis among genes governing seed weight and shape traits, although epistatic QTL pairs accounted for just a small part of the PV. Epistatic QTL pairs also interacted with the environment, with phenotypic contributions ranging from 0 to 0.7%; they also explained 0.4-0.8% of the PV for a single trait, indicating that they were relatively stable across environments.

JQSs Related to 100-Seed Weight and Seed Shape Traits
A QTL of a trait may be located in the same chromosomal region as QTL(s) of other trait(s). These QTLs may be either different loci or the same locus (referred to as JQSs) due to random shifting. Of the 42 identified QTLs, 28 were located in nine JQSs on nine chromosomes ( Table 7).
Frontiers in Plant Science | www.frontiersin.org Total 9 28 QTLs in JQSs 8 QTLs of (a) 11 pairs of (aa) *Epistatic QTL pairs with a bold QTL is reciprocal and duplicated. † Defined as neighboring QTLs within overlapped support (confidence) intervals, which were calculated using the QTL network procedure (Yang et al., 2007, originally in Lander andBotstein, 1989). QTL-i and QTL-j are an epistatic QTL pair composed of QTL-i and QTL-j. a, additive; aa, additive × additive interaction; JQS, joint QTL segment.
Among the nine JQSs, JQS-11 and -17 were independent of the others, whereas the remaining seven JQS interacted with other JQSs (epistasis) or else harbored epistatic QTL. Of these JQSs, JQS-19 interacted with JQS-6 and JQS-12 for multiple traits. Although there were no interactions among the other JQSs, interactions were observed at the individual QTL level. Thus, a single or multiple QTLs in a JQS may interact with a single or multiple QTLs in another JQS, while some QTLs are independent and do not interact with other QTLs. Among the 28 QTLs on nine JQSs, eight were independent; 11 epistatic QTL pairs were located on JQSs; three epistatic QTL pairs partly located on JQS, e.g., qSL-6-1 located on JQS, while qSL-19-1 isolated from JQS; and the remaining four epistatic QTL pairs were isolated from JQSs (Figure 2 and Tables 6, 7).

Functional Annotation of Candidate Genes in the Nine JQSs
Annotation of candidate genes in the nine JQSs was carried out using SoyBase 8 . 8,57,26,48,37,149,95, and 72 genes, respectively, for a total of 498 genes. In the GO enrichment analysis of 498 genes, in the molecular function category, most genes were enriched in terms related to activity and binding factors such as transcriptional regulator activity, ribonucleoside binding, and purine nucleoside binding ( Figure 3A). In the cell component category, genes were mostly enriched in plasma membrane, organelle envelope, and envelope ( Figure 3B). In the biological process category, the genes were enriched in multicellular organismal process, multicellular organism development, response to chemical, and other terms related to growth, development, and reproduction ( Figure 3C).
Previously published gene expression data for soybean (Severin et al., 2010) were used for gene annotation. Of the 498 genes, we selected 294 located in the nine JQSs with expression in seed tissues; 99 were included in PANTHER protein classes, including 16 that were related to seed weight and morphology. Additionally, two genes were identified by analyzing protein functions in the gene database combined with literature searches (Supplementary Table S4). Ultimately, 18 genes that were mostly related to seed weight and shape were selected as candidate genes for further analysis, including 1, 1, 1, 1, 3, 1, 3, 5, and 2 candidate genes located in respectively. Six of the genes were related to ubiquitin protein ligase; four to protein class of ribosomal protein; four to protein phosphatase; one (Glyma19g37910) to basic leucine zipper transcription factor; and one (Glyma01g26950) to tubulin; and one each encoded RING-H2 finger protein ATL52 and auxin response factor 18 (Glyma04g15820 and Glyma07g06060, respectively) (Supplementary Table S4). The 18 genes have been shown to be related to seed size and shape traits in multiple plant species including soybean; in particular, Glyma06g22900, and Glyma19g37910 have high expression in different seed tissues and low expression in other tissues (Severin et al., 2010), suggesting that they are important for seed development (Figure 3D and Supplementary Table S4). Thus, genes with similar functions in seed development are distributed in nine JQSs, which could account for the close relationship between seed weight and shape traits. In addition, some JQSs had a single candidate gene that conferred multiple traits whereas others had multiple candidate genes conferring several traits, indicating that some genes are pleiotropic and that multiple candidate genes may exist within a JQS.

Genetic Basis of 100-Seed Weight and Seed Shape Traits
The results of this study provide an outline of the genetic structure of the RIL population NJRISX. We identified 42 additive QTLs that contributed 24.9-37.5% to the PV in seed weight and shape traits (100-SW, SL, SW, SP, SA and SLW), as well as 2-4 of 18 epistatic QTL pairs per trait for all six seed traits that contributed 2.5-7.2% of the PV. The remaining PV (35.0-56.7%) was explained by unmapped minor QTLs. The 28 additive and 11 epistatic QTL pairs were located in nine JQSs, suggesting that they were closely related and interacted. In addition, their interaction with the environment contributed 1.4-9.5% and 0.4-0.8% to the PV, respectively, although not large but significant.
Additive QTL is an important genetic component of seed weight and shape traits. Many QTLs associated with soybean seed weight and size have been identified (304 for 100-SW, 29 for SL, 25 for SW, and 18 for SLW in SoyBase 9 ), but few have been examined in detail. Of the 42 QTLs detected in the present study, 8 were harbored in known loci (Supplementary Table S3) and the other 34 were identified for the first time; in 15 of these QTLs, each accounted for >5% of the PV.
Unmapped minor QTLs accounted for a considerable portion of the total PV, implying that more QTLs might be discovered by examining more lines in the population or more markers by improving the efficiency of the mapping procedure. Previous studies have demonstrated that restricted two-stage multilocus genome-wide association study using single nucleotide polymorphism linkage disequilibrium block markers showed a superior performance to CIM, MCIM, joint inclusive CIM, and MLM-GWAS in mapping QTLs associated with days to flowering in soybean Pan et al., 2018). However, this procedure can identity additive QTLs and gene × environment QTLs but not epistatic QTL pairs. For a comprehensive analysis of the genetic structure of seed traits, both procedures can be used on the same dataset or a new mapping procedure can be applied.

Correlation Among 100-Seed Traits and JQS Properties
The five first-order traits (100-SW, SL, SW, SP, and SA) were closely related with correlation coefficients ranging from 0.78 to 1.00, which was supported by the mapping results. Firstly, QTLs conferring different traits were located in the same segment, and could be the same QTL/gene with pleiotropic functions. Of the 42 QTLs, 28 were present in nine JQSs and 6, 3, 5, 7, and 7 QTLs were associated with 100-SW, SL, SW, SP, and SA, respectively. The number of JQSs shared between any two traits was highly consistent with their correlation coefficients ( Table 3); that is, the number of shared JQSs forms the genetic basis of closely related seed traits. Secondly, along with additive QTLs, epistatic QTL pairs of traits were located on the same set of two JQSs, including JQS-6/-19, and JQS-12/-19; the parallel QTL interactions increased the correlation between the two traits. Additionally, additive effects of QTLs in the same JQS were in the same direction (positive or negative), suggesting a pleiotropic effect of the same QTL/gene or an aggregation of multiple QTLs/genes occurring in the same direction in the related traits. In other words, the allele effects contribution of the different QTLs in the same JQS are from a same parent. The number of QTLs and traits in the nine JQSs in the present study varied, indicating that each JQS has unique characteristics and functions. Genes with similar functions in seed development were separated in the JQSs while each JQS also harbored QTLs of genes with similar function, which could underlie the close association between seed traits.
The average segment length of overlapped confidence intervals in the nine JQSs was 3.2 cM (rang: 0.3-11.3 cM)e.g., 11.3 cM for JQS-15 and 1.6 cM for JQS-19. The length depends on the number of QTLs linked at the same location (i.e., the density of markers in the segment). It is expected that by using higher-resolution genome-wide markers, more JQSs can be identified. From the present results, the JQSs can be grouped into three types based on whether individual QTLs in the JQS interact with others. (i) QTLs in the JQS all have epistatic interaction effects and interact with QTLs within the same JQS, resulting in a parallel relationship between traits. For example, every QTL in JQS-6 and -19 except those for SL (there was maybe a random shifting for qSL-19-1) had significant interactions with each other that appeared as interactions between the two JQSs, suggesting that the two segments had two interacting genes when they were actually the same QTL/gene with pleiotropic functions. JQS-12 and -19 also belong to this class. (ii) Some QTLs in JQSs (e.g., JQS-4, -7, and -15) show epistatic interactions but these do not occur in parallel between traits; or else different QTLs in same JQS interact with different JQSs, such as JQS-1. This class of JQSs contains QTLs with distinct properties (i.e., different QTLs for seed weight and seed shape traits) and hence, different genes. (iii) There is no epistatic interaction of QTLs in the JQS (e.g., JQS-11 and -17).
The above JQS results were observed in a set of mutually related traits, and would not be revealed if only a single trait was involved. When QTL mapping was performed for closely correlated traits, QTLs of different traits aggregated on neighboring or overlapping segments on the same chromosome , which are referred to as QTL hotspots (Zhang et al., 2019) or QTL clusters (Cai and Morishima, 2002). Because of limited marker density, QTL clusters were previously defined as regions with different QTLs located in the same or adjacent segments (Xu et al., 2011). While QTL clusters or hotspots have no defined thresholds, JQS is defined as a group of QTLs linked within confidence intervals; thus, using multiple related traits in JQSs can yield more precise and accurate results in the identification of QTLs/candidate genes. However, additional studies are needed to evaluate whether QTL grouping according to the support interval criterion of Lander and Botstein (1989) is appropriate.

CONCLUSION
In this study, we examined the QTL system of six seed weight and shape traits (100-SW, SL, SW, SP, SA, and SLW) in the soybean RIL population NJRISX, and identified 42 additive QTLs and 18 epistatic QTL pairs accounting for 24.9-37.5% and 2.5-7.2% of the PV, respectively, with the remaining part of the PV (35.0-56.7%) attributable to unmapped minor QTLs. Notably, 28 additive QTLs and the 11 epistatic QTL pairs were concentrated in nine JQSs, indicating that seed weight and shape QTLs are closely related and interact; moreover, additive QTLs and epistasis QTL pairs interaction with the environment made a small but significant contribution to the PV (1.4-9.5% and 0.4-0.8%, respectively). Thus, the JQS is important for interpreting the close relationships among the six seed weight and shape traits, especially the five first-order traits. Our findings provide a whole picture of the genetic structure of soybean seed traits and demonstrate that examining a group of closely related traits can be more informative than analyzing individual traits.

AUTHOR CONTRIBUTIONS
ML, GX, JG, LC, JZ, and MR analyzed the data, prepared and analyzed the images, and wrote the manuscript. LC and YX performed phenotype analyses. JZ performed the bioinformatics analysis. ML, LC, XX, and YX planted soybeans in the field. ML,