Characterizing Two Inter-specific Bin Maps for the Exploration of the QTLs/Genes that Confer Three Soybean Evolutionary Traits

Wang, Wubin; Liu, Meifeng; Wang, Yufeng; Li, Xuliang; Cheng, Shixuan; Shu, Liping; Yu, Zheping; Kong, Jiejie; Zhao, Tuanjie; Gai, Junyi

doi:10.3389/fpls.2016.01248

ORIGINAL RESEARCH article

Front. Plant Sci., 23 August 2016

Sec. Plant Genetics and Genomics

Volume 7 - 2016 | https://doi.org/10.3389/fpls.2016.01248

Characterizing Two Inter-specific Bin Maps for the Exploration of the QTLs/Genes that Confer Three Soybean Evolutionary Traits

Wubin Wang^1,2,3,4†

Meifeng Liu^1,2†

Yufeng Wang^1,2

Xuliang Li^1,2

Shixuan Cheng^1,2

Liping Shu⁵

Zheping Yu^1,2

Jiejie Kong^1,2

Tuanjie Zhao^1,2,3,4,6*

Junyi Gai^1,2,3,4,6*

¹Soybean Research Institute, Nanjing Agricultural University, Nanjing, China
²National Center for Soybean Improvement, Ministry of Agriculture, Nanjing, China
³Key Laboratory of Biology and Genetic Improvement of Soybean (General), Ministry of Agriculture, Nanjing, China
⁴State Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
⁵BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
⁶Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, China

Annual wild soybean (Glycine soja Sieb. and Zucc.), the wild progenitor of the cultivated soybean [Glycine max (L.) Merr.], is valuable for improving the later. The construction of a linkage map is crucial for studying the genetic differentiation between these species, but marker density is the main factor limiting the accuracy of such a map. Recent advances in next-generation sequencing technologies allow for the generation of high-density linkage maps. Here, two sets of inter-specific recombinant inbred line populations, named NJIRNP and NJIR4P, composed of 284 and 161 lines, respectively, were generated from the same wild male parent, PI 342618B, and genotyped by restriction-site-associated DNA sequencing. Two linkage maps containing 5,728 and 4,354 bins were constructed based on 89,680 and 80,995 single nucleotide polymorphisms, spanning a total genetic distance of 2204.6 and 2136.7 cM, with an average distance of 0.4 and 0.5 cM between neighboring bins in NJRINP and NJRI4P, respectively. With the two maps, seven well-studied loci, B1 for seed bloom; G and I for seed coat color; E2, E3, qDTF16.1 and two linked FLOWERING LOCUS T for days to flowering, were detected. In addition, two SB and two DTF loci were newly identified in wild soybean. Using two high-density maps, the mapping resolution was enhanced, e.g., G was narrowed to a region of 0.4 Mb on chromosome 1, encompassing 54 gene models, among which only Glyma01g40590 was predicted to be involved in anthocyanin accumulation, and its interaction with I was verified in both populations. In addition, five genes, Glyma16g03030, orthologous to Arabidopsis Phytochrome A (PHYA); Glyma13g28810, Glyma13g29920, and Glyma13g30710 predicted to encode the APETALA 2 (AP2) domain; and Glyma02g00300, involved in response to red or far red light, might be candidate DTF genes. Our results demonstrate that RAD-seq is a cost-effective approach for constructing high-density and high-quality bin maps that can be used to map QTLs/genes into such small enough regions that their candidate genes can be predicted.

Introduction

Soybean [Glycine max (L.) Merr.], is one of the most economically important leguminous seed crops. Many important traits in soybeans were complex quantitative traits controlled by multiple QTLs (quantitative trait loci)/genes and environments. To improve the effectiveness of marker-assisted breeding in soybean, the mapping resolution needs to be enhanced. High-density genetic maps, one of the most valuable genomic resources, can largely reveal the genome composition and meet the above requirement. Thus, the construction of a high-density linkage map for soybean is necessary for its future study and production.

In the past two decades, there have been a number of reports on the construction of soybean genetic maps. As a permanent population, recombinant inbred lines (RILs) have been mainly used for this research in soybean because they can be evaluated over space and time and used indefinitely, facilitating the exchange, accumulation, and sharing of genetic information (Mansur et al., 1996). Using an RIL population from a cross between two soybean cultivars, an initial genetic map defining 1550 cM of the soybean genome comprising 31 linkage groups consisting of 132 RFLPs, isozymes, and morphological and biochemical markers was reported in Lark et al. (1993). Since this initial linkage map, researchers have developed more than 40 linkage maps. Our research group also made great advancements in linkage map construction and developed more than 10 maps, whose typical representative was composed of 452 markers in 21 linkage groups and covered 3,595.9 cM of the soybean genome (Zhang et al., 2004). The above genetic maps have provided tools for understanding the genetic bases of soybean quantitative traits but have mainly focused on cultivated soybeans, while only few studies have investigated inter-specific crosses, which facilitate the detection and use of favorable exotic genes to broaden the cultivar’s genetic bases.

Meanwhile, the marker densities of linkage maps were low, except for those of GmComposite2003 (Song et al., 2004) and GmConsensus40 (Hyten et al., 2010), which integrated five and three different populations, respectively. Recent advances in next-generation sequencing technologies have provided effective platforms for the detection of high-quality single nucleotide polymorphism (SNP) markers for the genotyping of mapping populations (Varshney et al., 2009), and genotyping-by-sequencing (GBS) has been successfully used for genetic studies in a variety of species (Wu et al., 2010; Chen et al., 2014; Li et al., 2015). However, the limitations to GBS include a relatively large proportion of missing data, and a small but rarely corrected percentage of SNP genotyping sequencing errors always occur during GBS. One option for imputing missing SNP data is the sliding-window approach, where adjacent SNPs with the same genotype in an interval are combined into bins that demarcate recombination locations across the whole population (Huang et al., 2009). The bin-map method is more powerful for detecting QTL than traditional methods and has been employed for the fine mapping of yield-associated loci in rice (Yu et al., 2011), maize (Chen et al., 2014), and others. In soybean, Xu et al. (2013) constructed the first bin maps using 246 RILs derived from a G. max cross between Magellan and PI 438489B and fine mapped a root-knot nematode resistance QTL into a bin of 29.7 kb. However, the advantages of high-density linkage mapping have not yet been demonstrated in the soybean inter-specific cross.

Annual wild soybean (Glycine soja Sieb. and Zucc.) is acknowledged as the wild progenitor of cultivated soybean (Broich and Palmer, 1980). During evolution, the seed bloom (SB) in the wild soybean disappeared, and the seed coat color (SCC) became yellow in the released cultivars. During adaptation to their environments, soybeans with variable days to flowering (DTF) were selected. So far, the genetic bases of these three evolutionary traits, SB, SCC and DTF, have been well-studied in cultivated soybeans. For SB, three genes, B1, B2 and B3, have been reported (Palmer et al., 2004), while only B1 has been mapped to a large region of chromosome (Chr.) 13 (Chen and Shoemaker, 1998). At least five genetic loci (I, T, W1, R, and O) are known to control SCC (Palmer et al., 2004), but they represent large intervals, except for I, which was located in a 103-kb gene-rich region harboring a cluster of chalcone synthase genes on Chr. 08 (Clough et al., 2004). For DTF, remarkable progress has been made in identifying genes, including E1–E8 (Bernard, 1971; Buzzell, 1971; Buzzell and Voldeng, 1980; Mcblain and Bernard, 1987; Bonato and Vello, 1999; Cober and Voldeng, 2001; Cober et al., 2010), J (Ray et al., 1995), and QTLs (Mansur et al., 1996; Orf et al., 1999; Yamanaka et al., 2000; Tasma et al., 2001; Funatsuki et al., 2005; Reinprecht et al., 2006)¹. However, the allele differentiations between wild and cultivated soybeans in the above evolutionary traits are largely unknown.

In the present paper, to improve the resolution of QTL mapping and knowledge of the evolutionary process from the wild soybean to cultivated soybean, two sets of ultra-high-density linkage bin maps derived from inter-specific crosses were constructed through restriction-site-associated DNA sequencing (RAD-seq). Thus, the genetic bases of three evolutionary traits, SB, SCC and DTF, were dissected. The results indicated that the availability of such a high-density linkage map would provide breeders and geneticists with a much-wanted tool to identify a smaller genomic region associated with the wild allele for a target trait and to analyze the allele differences between wild and cultivated soybeans to broaden the genetic bases of cultivated soybeans.

Materials and Methods

Plant Materials and Phenotyping

To analyze the genetic difference between wild and cultivated soybeans, two inter-specific populations, NJIRNP and NJIR4P, composed of 284 and 161 RILs, respectively, were developed by five cycles of single seed descent from F₂ populations at the Jiangpu Experiment Station of Nanjing Agricultural University. NJIRNP was derived from a cross between Nannong86-4 (G. max) and PI 342618B (G. soja), while NJIR4P was derived from a cross between Nannong493-1 (G. max) and PI 342618B. Nannong86-4 and Nannong493-1, two cultivars in maturity group (MG) VI with days to maturity between 166 and 180 descripted by Gai et al. (2001), characterized by shiny SB, yellow SCC, later flowering, high yield, and high oil content were used as female parents, while PI 342618B, an annual wild soybean in MG I characterized by SB, black SCC, early flowering, high protein content, low 100-seed weight (1.1 g), and tolerance to flooding was used as a shared male parent.

NJRINP and NJIR4P, along with their parents, were grown in a complete randomized block design with three replications, one row per plot, and 100 cm between rows at Jiangpu Experiment Station in June 2010, 2011, 2012, and 2013 and at Anhui in 2012. Field management was performed under normal conditions. Three traits, SB, SCC and DTF, were investigated on a plot basis. For SB, the grading scales were B = Bloom (heavy coating of powdery substance adhering to the seed coat), S = Shiny (absence of bloom), and I = Intermediate (between bloom and shiny). For SCC, the grading scales were Y = Yellow, G = Green, Bl = Black, and Br = brown. DTF was scored as the days from sowing to the first flowering, which corresponded to the R1 developmental stage (Fehr et al., 1971).

DNA Extraction and SNP Identification by Sequencing

The three parents were re-sequenced and the two RIL populations were RAD-sequenced at the Beijing Genomics Institution (Shenzhen, China). Genomic DNA from the fresh leaves of the two populations and parents was extracted using modified CTAB method (Murray and Thompson, 1980). Whole-genome re-sequencing was applied for the three parents. Sequencing libraries were constructed according to the manufacturer’s instructions (Illumina).

According to the protocol descripted by Chutimanitsakun et al. (2011) and Wu et al. (2014), the RAD-sequencing libraries were constructed to sequence the RIL population. Total of five main steps were included. First, 1 μg purified genomic DNA of each RIL was digested by 10 units of TaqI restriction enzyme, for 60 min at 37°C in a 50 μl reaction system, then the product was heat inactivated for 30 min at 65°C. Second, the 30 μl digestion products were ligated with 3.0 μl of 100 mM P1 adapter by 0.5 μl T4 DNA ligase (10 U/μl, NEB), along with 0.6 μl of 100 mM ATP, 6 μl 10× NEB Buffer, 5 μl nuclease free water and incubated at room temperature for 20 min. The P1 adapter was with the molecular identifier, sticky-end matching the TaqI cleavage site (TaqI-P1 top oligo: 5′-AATGATACGGCGACCACCGAGATCTACA CTTTCCCTACACGACGCTCTTCCGATCT ××××× TGCA-3′, TaqI-P1 bottom oligo: 5′-phos ××××× AGATCGGAAGAGCGTCGTGTAGGGAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT-3′, ‘xxxxxx’ means molecular identifier). Third, every 24 RILs were pooled together and randomly sheared ultrasonically [Covaris S2 (Covaris, Inc.)], and QIAquick PCR Purification kit was applied for purifying sheared DNA fragments. The purified DNA was loaded on a 1.5% agarose gel with 100 bp DNA ladder and 400–600 bp DNA bands were cut from the gel and purified with MiniElute Gel Purification Kit (Qiagen). Fourth, the fragment end was repaired with Quick Blunting kit (NEB). A 3′-dA overhang was added using dA-tailing module (NEB). Then P2 adapter (top: 5′-phos-GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCAGAACAA-3′, bottom: 5′-CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT-3′) was ligated to the sticky end with an overhanging A. Fifth, the collected fragments were enriched by PCR amplification (Forward primer: 5′-AATGATACGGCGACCACCGA-3′, reverse primer: 5′-CAAGCAGAAGACG GCATACGA-3′) and purified by QIAquick PCR purification kit. Finally, each sample was normalized to 10 nM, and was quantified on an Agilent 2100 Bioanalyzer and sequenced on HiSeq 2000 instruments following the manufacturer’s protocol.

Short reads were separated into each sample using a 6-bp barcode. All of the sequence reads were aligned against the reference soybean genome Wm82 (Schmutz et al., 2010) using SOAP2 (Li et al., 2009). SOAPsnp (version 1.01) was used for SNP calling in the three parents. For the RIL population, RealSFS (Yi et al., 2010) was used for SNP calling based on the Bayesian estimation of locus frequency. The software fastPHASE (Scheet and Stephens, 2006) was used for genotyping SNP imputations after heterozygous alleles were turned into missing alleles. The candidate SNPs were filtered with the following critical criteria: (1) SNPs existed between two parents; (2) sites with a probability >0.95; and (3) whole depth >40 and <2000. The final set of SNPs was used to construct bin markers.

According to the three parents’ sequences, small insertion and deletion (InDel) calling was performed according to a previously described method. Alignment was processed with SOAP2 allowing 5-bp gaps. A three-read-supported site was recognized as an InDel.

RIL Genotyping and Bin Map Construction

High-density linkage maps of populations with high linkage disequilibrium contain many redundant markers that provide no new information but increase the computational requirements of mapping. Furthermore, a small percentage of genotypes are falsely called due to sequencing error. To address these issues, a modified version of the sliding-window approach developed by Huang et al. (2009) was applied. A sliding-window approach was used for RIL genotyping with a group of 15 consecutive SNPs and a step size of 2. The position of a SNP that switched one genotype to another consecutive genotype was recorded as the recombination breakpoint. A skeleton bin map was constructed according to the recombination breakpoint information. All of the breakpoints along the same chromosome were sorted by physical position. Consecutive 30-kb intervals that lacked a recombination event within the population were joined into bins, and the bins were used as markers. The linkage map was constructed using JoinMap 3.0 (van Ooijen and Voorrips, 2002) with Kosambi’s mapping function.

Gene/QTL Mapping

For the two qualitative traits SB and SCC, the genes were mapped using the Chi-square test. The Bonferroni-corrected significance levels (0.05/bin number) were approximately 10^-6 and 10^-5 for NJRINP and NJRI4P, respectively. For DTF, the QTL analysis was performed on WinQTLCart 2.5 software (Wang et al., 2003) with the model of composite interval mapping. A 10-cM window at a walking speed of 1 cM was used in a stepwise forward regression procedure. The LOD threshold was calculated using 1000 permutations for an experimental-wise error rate of P = 0.05. The additive effects (Add) and phenotypic variance explained by QTL (R²) were estimated according to the bin at the highest peaks as determined by WinQTLCart 2.5. More than one QTL with overlapping one-lod-support intervals was treated as one QTL in the present paper.

Results

Establishment of the Genomic Bin-Marker System in the Two RIL Populations

SNP Identification in the Three Parents through Re-sequencing

With the whole-genome re-sequencing method, a total of 123.52, 134.42, and 110.36 M reads were generated for PI 342618B, Nannong86-4 and Nannong493-1, respectively. The short paired-end reads were mapped back to the Williams 82 reference genome (Schmutz et al., 2010) to determine the physical positions of each SNP. A total of 10.01, 11.27, and 9.13 Gb bases were identified, equivalent to a 9.89-, 10.69-, and 9.13-fold depth of PI 342618B, Nannong86-4 and Nannong493-1, respectively, with an average coverage of 86.48, 93.82, and 87.57% (Supplementary Table S1). Then, the genetic differences between the parents of the RIL populations were investigated. As shown in Table 1, total of 828,796 SNPs were identified between PI 342618B and Nannong86-4, while 736,076 SNPs were identified between PI 342618B and Nannong493-1. Only 251,612 SNPs were detected between Nannong86-4 and Nannong493-1. We also detected approximately 209,551 InDels between PI 342618B and Nannong86-4, while 162,452 were detected between PI 342618B and Nannong493-1. This shows that there was great genetic diversity between the cultivar and wild parents, with Nannong86-4 and PI 342618B having much more, as Nannong86-4 is an elite cultivar and Nannong493-1 is a landrace. These markers that were identified between parents of RILs are useful for QTL fine mapping.

TABLE 1

TABLE 1. The genetic differences among the parents used to construct NJRINP and NJRI4P.

SNP Identification of the RILs through RAD-seq

The method of RAD-seq was used to sequence each RIL. Restrict enzyme fragments ranging from 400 to 600 bp for 445 RILs were sequenced, generating a total of 1262.82 million paired-end reads (108.36 Gb), ~238.15 Mb for each line (Supplementary Figure S1). Paired-end reads were aligned to their respective parents. The complex genome composition with a high percent of repetitive DNA and nearly 75% of the genes existing in multiple copies might confuse read alignment, causing incorrect SNP calling; thus, only unique mapping reads were retained for subsequent analysis. The unique mapped sites accounted for approximately 4% of the whole genome, and the depth of each site was approximately fourfold in each individual on average. Although the unique mapping reads only covered approximately 4% of the whole genome, the SNP sets of the populations occupied up to 10% of the total SNPs between parents, that is 89,680/828,796 SNPs for NJIRNP and 80,995/736,076 SNPs for NJIR4P, approximately 10.8 and 12.2 kbp per SNP throughout the whole genome, respectively (Table 2; Supplementary Figures S2 and S3). The SNPs identified on different chromosomes ranged from 3,315 on Chr. 13 to 7,103 on Chr. 18, 2,534 on Chr. 16 and 6,104 on Chr. 19 in NJIRNP and NJIR4P (Table 2), respectively. This indicates that our sequencing method could capture high-density SNPs, as well as decrease the genome complexity.

TABLE 2

TABLE 2. Distribution of SNPs on the 20 soybean chromosomes in the two RIL populations.

Establishing a Bin-Marker System in the Two RIL Populations

It is difficult to construct a genetic map with more than 80,000 SNPs. A bin map is one of the most effective methods to decrease work and increase efficiency. As described in the methods, a sliding-window approach with a group of 15 consecutive SNPs was chosen to determine recombination breakpoints. According to the breakpoint information of each line, we identified 5,728 and 4,354 bin markers in NJRINP and NJRI4P (Figures 1A,B), respectively, which was more than that of the intra-specific RIL population (3,509 bins) identified using whole-genome re-sequencing technology (Xu et al., 2013). In NJRINP, the length of the bin markers ranged from 1.5 kb to 17.7 Mb, with a mean of 165.8 kb, and 80.9% of the bin markers were less than 150.0 kb in length, among which 116 bins that were larger than 1.05 Mb in size and two large bins that were more than 10.0 Mb were dispersed on Chr. 08 (bin2145) and Chr. 19 (bin5230; Figure 1C; Supplementary Table S2). In NJRI4P, the length of the bin markers ranged from 30.0 kb to 18.9 Mb, with a mean of 218.2 kb, and 70.8% of the bin markers were less than 150.0 kb in length, among which 125 bins that were larger than 1.05 Mb in size and three large bins that were greater than 10.0 Mb were dispersed on Chr. 08 (bin1798), Chr. 18 (bin3875), and Chr. 19 (bin4044; Figure 1C; Supplementary Table S2).

FIGURE 1

FIGURE 1. Recombination bin maps of NJRINP and NJRI4P populations. (A) Bin map of NJRINP derived from the cultivated soybean Nannong86-4 and wild soybean PI 342618B. The bin map consists of 5,728 bin markers inferred from 89,680 high-quality SNPs in 284 RILs. The physical position is based on the reference genome Williams 82. Chromosomes are separated by vertical lines. Red, green, and blue represent Nannong86-4, PI 342618B and heterozygote genotypes, respectively. (B) Bin map of NJRI4P derived from the cultivated soybean Nannong493-1 and wild soybean PI 342618B. The bin map consists of 4,354 bin markers inferred from 80,995 high-quality SNPs in 161 RILs. Red, green, and blue represent Nannong493-1, PI 342618B and heterozygote genotypes, respectively. (C) Distribution of the physical lengths of the bins in the NJRINP and NJRI4P populations. (D) Distribution of the genetic distances between neighboring bins in the NJRINP and NJRI4P populations.

For NJRINP, 51.80% of all genotypes were inherited from the maternal parent Nannong86-4, 45.54% were from PI 342618B, and 2.66% were heterozygous. In contrast, for NJRINP, 46.07% were inherited from Nannong493-1, 50.06% were from PI 342618B, and the remaining 3.87% were heterozygous genotypes. The segregation ratios of each bin marker were calculated, and some significant segregation distortion regions were presented in both populations. In NJRINP, 176 of 5728 bins showed extreme segregation distortion at a significant level of P < 0.0001, distributed on Chr. 02, Chr. 16, and Chr. 19 (Supplementary Figure S4). In NJRI4P, one bin presented extreme segregation distortion at P < 0.0001, and two bins exhibited segregation distortion at P < 0.0005 (Supplementary Figure S4). Few markers showed segregation distortion among the two populations, indicating that the markers and the population were suitable for constructing a linkage map.

Construction and Characterization of the Two Linkage Maps with Genomic Bin Markers

Two high-resolution linkage maps were constructed using genomic bin markers in JoinMap 3.0 software (van Ooijen and Voorrips, 2002). The linkage map of NJRINP was composed of 5,728 bin markers distributed on 20 chromosomes, spanning a total distance of 2,204.6 cM (Table 3; Supplementary Figure S5). The genetic distance between neighboring bins was approximately 0.4 cM, ranging from 0.0 to 7.5 cM, 95.1% of which were less than 0.5 cM, and there were only six regions greater than 3.0 cM between neighboring bins dispersed on Chr. 05 (bin1155-bin1156, bin1294-bin1341, bin1161-bin1077, bin1154-bin1171, and bin1171-bin1172) and Chr. 13 (bin3346-bin3458; Figure 1D; Table 3).

TABLE 3

TABLE 3. Number of bin markers in each linkage group and the average distance.

Another linkage map of NJRI4P was composed of 4,354 bins, covering 2,136.7 cM (Table 3; Supplementary Figure S6). There was no obvious difference in length between the two linkage maps. However, the average genetic distance between neighboring bin markers in NJRI4P was greater than that of NJRINP, with a value of 0.5 cM, and only 59.0% of interval regions were less than 0.5 cM (Figure 1D; Table 3) due to the small population size (161 lines) used in the construction of the linkage map. There were seven regions greater than 3.0 cM between two neighboring bins dispersed on Chr. 02 (bin228-bin229, bin322-bin323), Chr. 05 (bin1119-bin1118, bin986-bin925), Chr. 13 (bin2809-bin2808, bin2728-bin2810), and Chr. 20 (bin4263-bin4264; Supplementary Figure S6).

To compare the order of the bin markers, a dot-plot diagram (Supplementary Figures S7 and S8) was generated using the physical position of each marker on the Williams 82 genome against its genetic position on the linkage groups. According to the analyses of the two linkage maps, most of the markers showed good linear agreement between physical and genetic maps based on the basic framework. However, there were also five inverted regions, all of which were located at the end of the chromosomes, including the upper end of Chr. 05 and Chr. 13 and the lower end of Chr. 05, Chr. 11, and Chr. 14. The genome variations among different species might be the first reason for this non-uniformity.

The above results indicate that the features of the bin linkage maps were as follows: (i) the density of markers was high, totaling 5,728 and 4,354 bins in NJRINP and NJRI4P, respectively, with average genetic distance between neighboring bins of approximately 0.4 and 0.5 cM; (ii) evenness throughout the whole genome, with and average genetic distance between neighboring bins from 0.0–7.5 to 0.0–14.1 cM in the respective populations, 95.1 and 59.0% of which were less than 0.5 cM; and (iii) more bin markers in the inter-specific RIL populations. However, the bins relative to the RIL population were non-comparable among populations but comparable based on the reference genome.

Mapping the Wild/Cultivated Gene/Alleles for Seed Bloom and Seed Coat Color

In the two RIL populations, two qualitative traits, SB and SCC, were classified into three and four grades (Table 4; Figure 2A), respectively. As shown in Table 4, for SB, 194, 40, and 42 lines were grouped into S (shiny, absence of bloom), B (bloom, heavy coating of powdery substance adhering to the seed coat) and I (intermediate, between shiny, and dull) in NJRINP, respectively, and the respective numbers were 79, 43, and 35 in NJRI4P. The associated loci of the two qualitative traits were mapped by the Chi-square test. As shown in Figure 2B and Table 5, a total of three genes were detected associated with SB in the two populations, among which gSB13 was novelly mapped into a small region on Chr. 13, with the 1-LOD support interval of 33,911,568–34,322,119 bp in NJRINP and 33,689,175–34,469,255 bp in NJRI4P, corresponding to the locus of B1 reported in a large genome region (Chen and Shoemaker, 1998). The other two genes of SB, gSB8 and gSB15, were detected only in NJRINP, located in intervals of 8,144,404–8,643,358 bp on Chr. 08 and 22,763,999–42,462,013 bp on Chr. 15, respectively. The most significant bins in the 1-LOD interval of genes were used to analyze the effects of the wild/cultivated alleles. As shown in Table 6, all of the wild alleles (w) on the three loci were associated with the phenotype of I and B, except for gSB15, on which the wild allele occurred in two opposite directions at low frequency (0.10) in the I group and at high frequency (0.85) in the B group.

TABLE 4

TABLE 4. Phenotypic distribution of three domestic traits among NJRINP and NJRI4P.

FIGURE 2

FIGURE 2. Mapping and analysis of wild alleles for seed bloom (SB) and seed coat color (SCC) using NJRINP and NJRI4P. The phenotypes of three parents and the grading scales of SB and SCC in NJRINP derived from Nannong86-4 (P1) and PI 342618B (P2) and NJRI4P derived from Nannong493-1 (P3) and PI 342618B (P2) are shown in (A). The grading scales were as follows: S = shiny (absence of bloom), I = intermediate (between shiny and bloom), and B = bloom (heavy coating of powdery substance adhering to the seed coat) for SB, and Y = yellow, G = green, Bl = black, and Br = brown for SCC. The Manhattan plots on the left of the graph depict the extent of the association of 5,728 and 4,354 bin markers dispersed as shown over the 20 soybean chromosomes with SB (B) and SCC (C) of NJRINP and NJRI4P, respectively. The -log₁₀P value is a measure of the degree to which a bin is associated with the trait based on the Chi-square test.

TABLE 5

TABLE 5. Genes/QTLs associated with three domestic traits in the two populations.

TABLE 6

TABLE 6. Analysis the effect of wild allele of the genes conferring SB and SCC.

With the same method as used for SB, a total of three genes were detected for SCC in the two populations. Among them, gSCC1 and gSCC8 corresponding to the two SCC genes, G (Song et al., 2004) and I (Ohnishi et al., 2011), respectively, were mapped into two small regions of 52,279,678–52,646,512 and 8,346,892–8,643,359 bp in NJRINP and 52,032,052–52,462,360 and 8,434,875–8,540,484 bp in NJRI4P on Chr. 01 and Chr. 08, respectively (Figure 2C; Table 5). The effects of wild alleles on the two loci are shown in Table 6. In NJRINP, a high frequency of the wild allele (0.86) of G was observed in 65 RILs with green SCC, and the wild allele of I was associated with black SCC with a frequency of 0.98 in 116 black SCC RILs. The fact that the 50 RILs of NJRINP with the genotype w/w at the loci of G and I were associated with the black SCC instead of the green SCC indicates that there must be interaction between these loci, and this interaction was verified in NJRI4P. G and I can explain 93.6% of the phenotypic variation in NJRINP, including 57 yellow, 56 green, 50 black, and 58 black SCC lines of a total of 221 of 236 lines; the third gene, gSCC16, detected on Chr. 16 in NJRINP, whose wild allele is associated with both green and black SCC, might be a false-positive gene for SCC. In summary, two genes, G and I, were detected in the two mapping populations, G was mapped to a smaller region than that previously reported, and the interaction between these genes was verified first at the molecular level.

Mapping the Wild/Cultivated QTL Alleles Conferring Days to Flowering

Phenotypic and Genotypic Variation of Days to Flowering

Days to flowering was evaluated for NJRINP and NJRI4P populations under four and three environments, respectively. As shown in Table 4, large differences and broad phenotypic variations were observed between the cultivated and wild parents and among their inter-specific RIL populations in DTF (Supplementary Figure S9). For NJRINP, the ranges of DTF values were 48.3–63.3, 32.5–49.0, 32.3–56.3, and 34.0–60.0 days in the environments of JP2010, JP2011, AH2012, and JP2013, respectively, with means of 50.8, 37.2, 42.6, and 44.3 days. For NJRI4P, the ranges of DTF were 36.7–60.0, 40.0–54.0, and 34.0–56.7 days in AH2012, JP2012, and JP2013, respectively, with means of 44.7, 48.1, and 41.3 days. The high heritability of DTF from 89.3 to 95.1% suggests that genetic variation accounted for a major part of the phenotypic variance in the populations. A joint ANOVA of the data from multiple environments showed that the variations among RILs, environments, and RIL × environment interactions were all significant (Supplementary Table S3). Therefore, QTL mapping was performed separately based on individual environments.

QTL Mapping of Days to Flowering

With the composite interval mapping using WinQTLCart 2.5 software (Wang et al., 2003), a total of three putative QTLs were identified in the NJRINP (Table 5; Figure 3A). The QTL qDTF10 in the region of 44,593,532–45,047,245 bp on Chr. 10, overlapping with the previously cloned gene E2, had the highest genetic contribution values (R²) ranging from 26.4 to 44.3% in the four different environments. qDTF16.2 in the region of 3,966,960–4,570,856 bp on Chr. 16 and qDTF19 in the region of 47,310,112–47,815,831 bp on Chr. 19 were stable across the four environments, with R² values ranging from 8.3–13.5 to 11.1–19.2%, respectively, and overlapped with two linked FLOWERING LOCUS T homologs of Arabidopsis thaliana (GmFT5a and GmFT3a; Kong et al., 2010) and E3, respectively. The wild alleles on these loci were associated with additive effects from -2.6 to 3.3 days. To verify the allelic differences on each locus, the wild allele effects of DTF were also roughly estimated from the individuals with and without the allele according to the most significant bin marker genotypes. As shown in Figure 3A, there were significant differences between wild and cultivated alleles at each locus, with the G. soja allele (w) causing a consistently later DTF than that of the G. max allele over time on qDTF10 and earlier DTF on qDTF16.2 and qDTF19.

FIGURE 3

FIGURE 3. Wild alleles and their effects associated with days to flowering (DTF) in NJRINP and NJRI4P. (A) Wild alleles and their effects associated with DTF in NJRINP derived from Nannong86-4 and PI 342618B. (B) Wild alleles and their effects associated with DTF in NJRI4P derived from Nannong493-1 and PI 342618B. A total of six wild alleles/QTLs related to DTF were detected in the two populations, among which qDTF10 and qDTF19 could be detected in both of the populations under all environments, including JP2010, JP2011, JP2012, AH2012 and JP2013, and encompass the cloned genes E2 and E3, respectively. The locus qDTF16.2, detected only in NJRINP under four environments, also encompassed the two linked FLOWERING LOCUS T genes GmFT5a and GmFT3a. The other three QTLs, including two newly identified QTL/alleles, qDTF02 and qDTF13, and one previously reported locus, qDTF16.1, were detected only in NJRI4P. The allele effects were analyzed according to the genotype of lines based on the most significant bin marker, and two letters, w and c, represent the wild and cultivated alleles, respectively.

In the NJRI4P population, a total of five QTLs were detected in the three environments (Table 5; Figure 3B), located on Chr. 02, 10, 13, 16 and 19, with R² values ranging from 4.9 to 19.5%. All of the QTLs could be detected in at least two environments. Among these QTLs, qDTF10 and qDTF19 corresponded to the reported genes E2 and E3, respectively, and qDTF16.2 spanned two physical intervals of 1–1,139,451 and 751,606–3,823,243 bp in the environments of JP2012 and JP2013 and overlapped with the previously reported QTL Fflr 13-8 (Pooprompan et al., 2006). However, these QTLs were narrowed down to intervals of 1.1 and 3.1 Mb, respectively, in the NJRI4P population. The other two QTLs, qDTF02 and qDTF13, were newly detected in the wild soybean PI 342618B and mapped to intervals of 1–233,022 and 30,886,271–33,458,132 bp on Chr. 02 and Chr. 13, respectively. The wild alleles at the five loci were associated with additive effects from -1.5 to 2.0 days based on the mapping software, consistent with the allele effects of DTF that were roughly estimated from the individuals with or without wild alleles (Figure 3B). Compared to the cultivated alleles, the G. soja alleles (w) caused a consistently later DTF on qDTF02, qDTF10, and qDTF13 and an earlier DTF on qDTF16.1 and qDTF19.

In conclusion, a total of six QTLs were detected for DTF. In previous research, three QTLs, qDTF10, qDTF16.1 and qDTF19, could be detected in other mapping populations derived from G. max soybeans, such as ‘Misuzudaizu × Moshidou Gong 503’ for qDTF10 and qDTF19 (Yamanaka et al., 2001) and ‘AGS292 × K3’ for qDTF16.1 (Pooprompan et al., 2006), indicating that allele differentiations existed not only between G. max and G. soja at these loci for DTF but also among G. max soybeans. The QTL qDTF16.2 overlapped with two linked FLOWERING LOCUS T homologs of A. thaliana (GmFT5a and GmFT3a; Kong et al., 2010). The other two QTL, qDTF02 and qDTF16.1, were newly identified in the wild soybean, which indicated that allele differentiations were not detected in cultivated soybeans at these loci for DTF and that these wild alleles might have the potential to broaden the genetic basis of cultivated soybean in DTF.

Candidate Gene Prediction for the Three Evolutionary Traits Using the High-Density Bin Maps

Seed coat pigmentation is induced by the deposition of a number of flavonoids in the seed coat of soybean, and the synthesis of these compounds mainly derives from an anthocyanin biosynthesis branch of the phenylpropanoid pathway of secondary metabolism (Yang et al., 2010). According to the mapping results in the present study, there were a total of 54 gene models in the interval of gSCC1 (G) (Table 5), among which only one gene model of Glyma01g40590 was predicted to involve in anthocyanin accumulation and thus might be the most likely candidate gene of gSCC1 (G). According to the gene annotation database at SoyBase¹, the small physical interval of gSCC8 in the NJRINP population encompassed only 63 gene models, among which Glyma08g11520, Glyma08g11530, Glyma08g11610, Glyma08g11620, Glyma08g11630, and Glyma08g11650 were predicted to encode Chalcone and Stilbene synthase family proteins and thus might be the candidate genes of gSCC8 (I) (Table 5).

The key genes and distinct pathways involved in flowering have been identified through many Arabidopsis studies. Phytochromes are red-light- and far-red-light-absorbing photoreceptors in the leaves of most plants (Chen et al., 2004). Mutations in the phytochrome genes cause altered flowering phenotypes in Arabidopsis and soybean (Reed et al., 1994; Liu et al., 2008). In the region of the novel locus qDTF16.1 detected in the wild soybean, there was only one soybean gene (Glyma16g03030) orthologous to Arabidopsis SPA1 (suppressor of phyA protein family). APETALA 2 (AP2) is related to various aspects of plant development, including flowering and seed mass control (Jofuku et al., 2005). In the interval of qDTF13, three gene models, Glyma13g28810, Glyma13g29920 and Glyma13g30710, were predicted to encode the AP2 domain. A total of 19 gene models were located in the interval of qDTF02, and although there were no homologs of genes for Arabidopsis flowering, one gene model, Glyma02g00300, was predicted to be involved in the response to red or far red light. These candidate genes are worth investigating in future research.

Discussion

The Features of a RAD-seq Based Genome-Wide Inter-specific Bin Linkage Map

The marker density is high in the two inter-specific linkage maps. In the past two decades, at least 40 genetic maps have been constructed¹ that provide good tools for investigating the soybean genome. However, many of them were constructed based on morphological, cytological, biochemical and SSR markers, among others, which are limited in number or are difficult to genotype if 100 or 1000s of markers are involved in a large population. Because there have been few markers in most previous research, the QTLs of target traits were mapped to large intervals of more than 10 cM. Thus, the bottlenecks of soybean map-based cloning or fine mapping included the insufficiency of molecular markers and the lack of highly efficient genotyping approaches, as suggested by Xu et al. (2013). Recent advances in next-generation sequencing technologies have provided effective platforms for the direct detection of high-density SNP markers, which are the most abundant DNA variations, to genotype a mapping population (Varshney et al., 2009; Chen et al., 2014; Li et al., 2015). Using whole genome re-sequencing technology, a linkage map composed of 3,509 bin markers had been constructed using two G. max accessions (Xu et al., 2013). Because of the expense of the whole-genome re-sequencing of a large population, a RAD-seq technology was employed in the present study. According to the breakpoint information of each line, the highest-density linkage maps were constructed, including 5,728 and 4,354 bin markers in NJRINP and NJRI4P, respectively, which were denser than those in previous studies. The above results suggest that the RAD-seq-based bin strategy was efficient for the generation of a high-density linkage map.

There were many polymorphisms at the phenotype and genome levels among the two populations. Although, there are many advantages using an inter-specific cross to a construct linkage map (Miller and Tanksley, 1990), the number of inter-specific maps is much lower because of the laborious process of their development. In the present research, two ultra-high-density genetic maps were constructed using two cultivated soybeans and a shared wild soybean. According to the results, there were a wide range of polymorphisms between the two inter-specific parents, including DNA and phenotype. At the DNA level, 828,796 and 736,076 SNPs were polymorphic between Nannong86-4 and PI 342618B and between Nannong493-1 and PI 342618B, respectively, much more than that (251,612) between Nannong86-4 and Nannong493-1. This indicates that nucleotide diversity decreased remarkably during the domestication process from wild to cultivated lines. This might be the reason why much more bins were contained in the present populations than in the population derived from the two cultivars Magellan and PI 438489B (Xu et al., 2013), although whole-genome sequencing was used to identify SNPs in the population. The polymorphisms between the parents in phenotype determine the application scope of the mapping population. In the present study, there were large differences between Nannong86-4, Nannong493-1 and wild soybean PI 342618B, including the color of the flower and seed coat, stem growth habit, DTF, seed size, protein and oil contents, stress tolerance and so on. Thus, the two populations would have wide ranges of applications.

The two linkage maps had high quality and mapping accuracy. Compared to the reference genome Wm82, the orders of most bins in each linkage map of NJRINP and NJRI4P showed good linear agreement with their physical positions. Using the two inter-specific maps, seven well-studied loci, B1 for SB (Chen and Shoemaker, 1998), G (Song et al., 2004) and I (Ohnishi et al., 2011) for SCC, and E2 (Watanabe et al., 2011), E3 (Watanabe et al., 2009), qDTF16.1 (Pooprompan et al., 2006) and two linked FLOWERING LOCUS T (Kong et al., 2010) for DTF, were detected, indicating the high quality of the two maps. The mapping accuracy was effectively improved, and few genes were contained in the 1-LOD interval of genes/QTLs. If the most significant bin marker that is associated with trait is used to predict candidate genes, the number of genes will be much lower. Thus, in the present paper, the candidate genes of traits can be predicted for subsequent deep research, such as gene cloning and gene functional verification, e.g., G was narrowed to a region of 0.4 Mb on Chr. 01, encompassing 54 gene models, among which only Glyma01g40590 was predicted to be involved in anthocyanin accumulation. Due to the high resolution of gene/QTL mapping, interactions between genes/QTLs can be detected. For example, the interaction of two SCC genes, G and I, were first detected and analyzed at the molecular level in the present study and verified by both of the populations.

The Regions of Differentiation between Wild and Cultivated Soybean with Regard to Two Qualitative Traits and One Quantitative Trait

The classic characteristics of the wild soybean seed coat are bloom and black color. Through evolution and artificial selection, the SB disappeared and SCC became yellow, but the location of these wild alleles is largely unknown. Although, three genes of SB, B1, B2 and B3, have been reported (Palmer et al., 2004), only B1 has been mapped to a large region of Chr. 13 (Chen and Shoemaker, 1998), while the position of other two loci are unknown. In the present study, three genome regions had allelic differentiation between wild and cultivated soybean in SB, among which gSB13 was located in the region of B1, while the other two genes may be in the region of B2 or B3. So far, at least five genetic loci (I, T, W1, R, and O) are known to control SCC (Palmer et al., 2004), but only two loci that correspond to G and I were detected to have allelic differentiation between the wild and cultivated parents in the present study. The two loci explained almost the entire genetic variation of RILs with effects of green and black SCC, but the gene associated with brown SCC was not detected in the two RIL populations, which might be due to the low frequency of brown SCC. Meanwhile, we first detected the interaction between G and I in both populations at the molecular level.

Worldwide, soybean is grown at a wide range of latitudes from the equator to up to 50°, most likely due to genetic diversity at a large number of genes and QTLs controlling DTF. Several DTF genes, designated as E1–E8 (Bernard, 1971; Buzzell, 1971; Buzzell and Voldeng, 1980; Mcblain and Bernard, 1987; Bonato and Vello, 1999; Cober and Voldeng, 2001; Cober et al., 2010) and J (Ray et al., 1995), have been characterized by classical methods in cultivated soybeans. Of these, E1, E2, E3, and E4 have been cloned (Liu et al., 2008; Watanabe et al., 2009; Watanabe et al., 2011; Xia et al., 2012). Moreover, numerous QTLs for DTF were detected by linkage mapping and genome-wide association studies, of which 63 QTLs (Mansur et al., 1996; Orf et al., 1999; Yamanaka et al., 2000; Tasma et al., 2001; Funatsuki et al., 2005; Reinprecht et al., 2006) are published in SoyBase². Despite the economic importance of soybean, our knowledge about the molecular mechanisms of its flowering is still limited, especially in wild soybean. Here, we show the identification of DTF loci in a wild soybean and present a total of six QTLs that were identified in the two populations in multiple environments. Among them, qDTF10, qDTF16.2, and qDTF19 overlapped the cloned DTF genes E2 (Watanabe et al., 2011), GmFT5a and GmFT3a (Kong et al., 2010), E3 (Watanabe et al., 2009), respectively, in cultivated soybean, and qDTF16.1 was detected by Pooprompan et al. (2006) in a G. max cross, suggesting that allelic differentiation occurs not only in cultivated soybeans but also between G. max and G. soja in the four regions of DTF genes. The other two genes were newly detected in wild soybean; allelic differentiation in cultivated soybeans has been not detected but might occur only between G. max and G. soja, which may have the potential to broaden the genetic basis of DTF in cultivated soybean.

Author Contributions

WW and ML carried out the phenotyping and genotyping of the two RIL populations, participated in the construction of the populations and data analysis, and drafted the manuscript. YW participated in the sequencing library construction, SNP calling and bin-map construction. XL, SC, and LS participated in the phenotyping of the two populations, and JK participated in language editing. TZ and JG conceived the study, participated in its design and coordination, and helped to draft the manuscript. All of the authors read and approved the final manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The work was funded by the National Natural Science Foundation of China (31271750, 31571691), MOE Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT13073), and Jiangsu JCIC-MCP program.

Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpls.2016.01248

Footnotes

References

Bernard, R. L. (1971). Two major genes for time of flowering and maturity in soybeans. Crop Sci. 11, 242–244. doi: 10.2135/cropsci1971.0011183X001100020022x